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Preface 


Influence  of  Epigenetic  Phenomena  on  Gene  Expression 
and  Inheritance  of  Phenotypes 

One  of  the  many  definitions  of  an  epigenetic  mark  is  a  heritable  feature  that  does  not 
change  the  DNA  sequence  but  determines  when,  where,  and  to  what  extent  a  gene 
will  be  expressed.  Hence,  epigenetics  is  a  science  that  studies  DNA  packaging  and 
regulation  of  its  expression.  Although  often  introduced  as  a  new  science, 
epigenetics  dates  back  to  the  discovery  of  the  roles  of  chromatin  and  DNA 
methylation  in  controlling  gene  expression  in  the  60s  and  70s  of  the  last  century. 
Despite  the  intimate  relationship  between  DNA  and  epigenetic  factors,  mainstream 
studies  of  genetic  traits  in  humans  and  animal  models  have  largely  ignored  the 
existence  of  epigenetic  factors  during  the  past  decades,  while  the  epigenetics 
community,  although  part  of  both  the  genetics  and  developmental  biology  fields, 
was  digging  deeper  and  deeper  into  the  molecular  mechanisms  of  epigenetic 
phenomena  but  seldom  tackling  problems  of  complex  genetic  traits  in  mammals. 
One  of  the  reasons  for  the  dichotomy  is  the  very  complexity  of  complex  traits  where 
small  effects  from  multiple  loci  define  the  phenotype,  whereas  traditional  molecular 
biology  research  required  focusing  on  one  selected  target  at  a  time.  Another  reason 
was  the  lack  of  methodologies  capable  of  analyzing  large  amounts  of  epigenetic 
information  in  large  cohorts  of  patients  and  controls.  Nevertheless,  during  the  last 
two  decades,  in-depth  analysis  of  inheritance  patterns  combined  with  molecular 
approaches  in  a  number  of  animal  models,  such  as  agouti  viable  yellow  mice  and 
callipyge  sheep,  has  provided  remarkable  examples  of  how  the  interplay  between 
genetic  and  epigenetic  factors  can  generate  complex  traits. 

Rapid  technological  improvements  are  now  making  it  possible  to  measure 
epigenetic  signals  at  many  genomic  locations  in  an  unprecedented  way  and  conduct 
prior-hypothesis-free  epigenetic  studies.  Global  initiatives  such  as  the  International 
Human  Epigenome  Consortium  are  underway  to  obtain  high-resolution  maps  of 
histone  modifications,  DNA  methylation,  and  transcription  start  sites  and  to  com- 
pare epigenome  signals  and  the  resulting  transcriptional  regulation  in  a  wide  variety 
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of  tissues  and  different  cell  types.  However,  even  hypothesis-free  data  analyses 
require  knowledge  of  epigenetic  paradigms  to  make  informed  decisions  when 
interpreting  these  massive  data  sets. 

In  this  book,  we  have  focused  on  the  relationship  between  epigenetics  and 
complex  traits,  since  this  field  can  be  daunting  for  those  wishing  to  do  research. 
The  biology  is  complex,  and  the  ramifications  of  epigenetic  regulation  are  wide- 
spread. Epigenetic  states  may  contribute  to  the  penetrance  of  genetic 
polymorphisms  or  mutations  and  thereby  modify  inheritance  patterns.  This  may 
result  in  apparently  non-Mendelian  inheritance  of  genetic  traits.  Epigenetic  changes 
in  an  individual  may  affect  several  different  generations,  depending  on  when  these 
changes  occur  and  in  which  cells.  Genetic  factors  will  influence  epigenetic  factors, 
and  possibly  their  transmission.  Effects  may  vary  depending  on  sex,  and  also  on  the 
sex  of  an  implicated  parent.  Concepts  that  applied  in  genetics,  such  as  heritability, 
or  the  proportion  of  variance  explained  by  genetics,  can  now  be  expanded  to 
explicitly  consider  the  epigenetic  contributions.  Furthermore,  of  course,  different 
loci  may  demonstrate  different  associations  with  all  these  factors.  Design  of 
experiments  and  analysis  of  experimental  data  must  reflect  this  complexity  and 
be  carefully  approached. 

Therefore,  this  book  presents  14  detailed  and  distinct  views  on  the  interplay 
between  complex  traits  and  epigenetics.  The  chapters  are  grouped  into  three 
sections:  (1)  Fundamental  aspects  of  the  biology  in  epigenetics,  with  focus  on  the 
period  in  mammalian  development  that  is  pivotal  for  genetic  transmission,  i.e., 
gametogenesis  and  early  embryonic  development,  insight  into  how  the  epigenetic 
marks  are  established,  maintained,  and  transmitted  and  their  influence  on  gene 
expression;  (2)  The  known  impact  of  epigenetic  factors  on  several  different  com- 
plex traits  and  diseases  of  interest  for  human  genetics;  and  (3)  Approaches  to 
experimental  design  and  statistical  analysis  in  this  context. 

Our  hope  is  that  the  two  communities  of  basic  researchers  and  analysts  will  find 
mutual  enrichment  through  this  combination  of  material.  An  overview  of  available 
analytic  methods  and  their  underlying  assumptions  could  inform  experimental 
design  choices.  Similarly,  improved  understanding  of  the  biology  could  lead  to 
better  choices  for  analysis,  and  an  appreciation  for  the  many  factors  that  may  need 
to  be  considered.  Ultimately,  this  marriage  of  topics  could  lead  to  improved  study 
designs,  rich  and  complete  analytic  frameworks,  new  approaches  to  analysis,  and 
guidelines  for  interpretation. 

Of  course,  this  book  includes  only  a  small  overview  of  the  available  knowledge 
and  approaches,  yet  we  anticipate  that  this  will  be  a  helpful  first  reference  for 
researchers  entering  the  field,  and  will  stimulate  future  developments.  We  thank 
Springer  for  making  this  endeavor  possible. 
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Chapter  1 

Epigenetic  Reprogramming 
in  the  Mammalian  Germline 


Stephanie  Maupetit-Mehouas,  David  Nury,  and  Philippe  Arnaud 


Abstract  Epigenetic  modifications  are  crucial  for  maintaining  and  faithfully 
transmitting  the  identity  of  each  cell  type  during  cell  division.  During  mammalian 
germ  cell  development,  the  acquisition  of  the  ability  to  form  a  totipotent  zygote  is 
associated  with  extensive  epigenetic  reprogramming  that  affects  all  major  develop- 
mental processes,  including  genomic  imprinting,  X-inactivation,  retroelement 
silencing  and  gene  expression.  The  existing  epigenetic  patterns  are  first  erased 
during  primordial  germ  cell  development,  followed  by  acquisition  of  a  germline- 
specific  epigenetic  signature  that  can  be  eventually  transmitted  to  and  interpreted  by 
the  progeny.  A  better  characterisation  of  the  underlying  mechanisms  is  relevant  for 
both  fundamental  and  clinical  research  dealing  with  epigenetic  inheritance,  epige- 
netic control  of  mammalian  development  and  regenerative  medicine.  In  this  review 
we  present  and  discuss  recent  advances  on  the  nature,  mechanisms  and  consequences 
of  resetting  the  epigenetic  pattern  during  primordial  germ  cell  formation  and  (re) 
acquiring  a  new  set  of  epigenetic  marks  at  later  stages  of  germline  development. 


1.1  Introduction 


During  somatic  development  of  higher  organisms,  pluripotent  cells  progressively 
reduce  their  differentiation  potential  and  become  committed  to  a  particular  cell  fate 
with  specific  gene  expression  and  functional  profiles.  This  tightly  regulated  process 
requires  the  concerted  action  of  specific  factors  and  is  accompanied  or  caused  by 
dynamic  chromatin  changes  that  influence  gene  expression  patterns  and  phenotype. 
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These  changes  occur  at  the  level  of  DNA  methylation,  histone  tail  modifications, 
nucleosome  remodelling  and  regulation  of  higher  order  chromatin  structures.  Most 
(but  not  all)  of  these  modifications  are  heritable  from  one  cell  generation  to  the 
next  and  are  thus  referred  as  being  epigenetic.  Thus,  each  cell  type  in  an  organism 
is  characterised  by  a  specific  and  stable  epigenetic  signature  (epigenome)  that  is 
transmitted  to  the  daughter  cells.  Once  specified,  the  epigenome  of  a  cell  type  is 
relatively  stable.  However,  in  mammals,  there  are  two  key  developmental  stages  in 
which  epigenetic  patterns  are  profoundly  modified,  with  erasure  of  the  existing 
epigenetic  marks  and  acquisition  of  a  new  set.  This  so-called  epigenetic 
reprogramming  occurs  first  in  early  embryogenesis,  following  fertilisation,  when 
the  epigenetic  information  carried  by  the  mature  gametes  is  removed  and  replaced 
by  an  embryonic/somatic  signature  at  the  peri-implantation  stage.  This  "embry- 
onic" reprogramming  is  incomplete  as  some  genomic  regions,  notably  the  ex- 
acting regulatory  sequences  of  imprinted  gene  loci  (imprinting  control  regions, 
ICRs),  escape  this  process.  A  more  thorough  epigenetic  reprogramming  occurs 
during  gametogenesis  and  it  virtually  impacts  all  epigenetic-based  developmental 
processes:  genomic  imprinting,  X-inactivation,  retroelement  silencing  and  gene 
expression.  The  understanding  of  the  underlying  mechanisms  is  relevant  for  both 
fundamental  and  clinical  research.  It  will  enable  to  better  define  the  role  of 
epigenetics  in  the  control  of  mammalian  development  and  also  to  elucidate  the 
mechanism  of  in  vitro-induced  reprogramming/pluripotency. 

This  review  focuses  on  the  germline  epigenetic  reprogramming  and  discusses 
recent  findings  on  the  mechanisms  involved  in  erasing  the  epigenetic  pattern  during 
primordial  germ  cell  (PGC)  formation  and  in  (re)acquiring  a  new  set  of  epigenetic 
marks  at  later  stages  of  germline  development. 


1.2    Temporal  and  Spatial  Dynamics 
of  Mouse  Germ  Cell  Development 

Among  all  the  cell  lineages  of  a  complex  organism,  only  germ  cells  can  give  rise  to 
a  new  individual,  allowing  the  transmission  of  genetic  and  possibly  epigenetic 
information  to  the  next  generation.  Germ  cell  development  initiates  with  the 
specification  of  PGCs,  which  following  colonisation  of  the  embryonic  gonads 
will  develop  into  oocytes  or  spermatozoa.  In  mammals,  most  of  our  knowledge 
on  the  temporal  and  spatial  dynamics  of  this  tightly  regulated  process  comes  from 
the  mouse  model  (Fig.  1.1). 

Unlike  other  non-mammalian  species,  such  as  D.  melanogaster  and  zebrafish, 
mouse  PGCs  are  not  predetermined  at  fertilisation  but  are  specified  in  the  post- 
implantation  embryo.  At  embryonic  day  4.5  (E4.5),  following  blastocyst  implanta- 
tion, there  is  a  rapid  increase  in  the  number  of  inner  cell  mass  cells,  leading  to 
the  formation  of  the  epiblast  (the  source  of  all  the  body  cell  lineages).  Germ  cell  fate 
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is  induced  in  the  proximal  epiblast  in  a  dose-dependent  manner  by  bone 
morphogenetic  protein  (BMP)  signals  from  the  extra-embryonic  ectoderm  at 
-E6.25  (Lawson  et  al.  1999).  This  leads  to  the  formation  of  a  pool  of  PGC 
precursors  of  which  only  a  limited  number  (about  6  cells),  characterised  by  the 
expression  of  the  zinc  finger  transcriptional  regulators  BLIMP  1  (B -lymphocyte- 
induced  maturation  protein  1,  also  known  as  PR-domain-zinc-finger  protein 
1,  PRDM1)  and  PRDM14  (PR-domain-zinc-finger  protein  14),  acquire  a  PGC 
fate.  As  a  result,  "fate-determined"  PGCs  emerge  at  -E7.25  as  a  cluster  of 
-20-40  cells  located  at  the  base  of  the  forming  allantois  (Ginsburg  et  al.  1990; 
Ohinata  et  al.  2005,  2009;  Yamaji  et  al.  2008).  From  ~E7.5,  PGCs  migrate  through 
the  hindgut  and  mesentery  and  start  colonising  the  nascent  genital  ridges  (i.e.,  the 
future  gonads)  at  -E10.5.  During  this  process,  PGCs  rapidly  proliferate:  from 
around  100  PGCs  at  E8.5  to  -200  at  E9.5  and  -600  at  E10.5.  In  the  genital  ridges, 
PGCs  still  proliferate  up  to  El 3. 5  (-26  000  cells),  when  they  stop  dividing 
(Mochizuki  and  Matsui  2010;  Kagiwada  et  al.  2012)  (Fig.  1.2). 

Following  colonisation  of  the  developing  gonad,  at  El 2.5,  PGCs,  now  referred 
to  as  germ  cells  (GCs),  start  differentiating  into  male  or  female  gametes.  In  the 
developing  ovary,  at  El 3.5,  female  GCs  initiate  meiosis  I  that  will  be  blocked  at 
the  diplotene  stage  of  prophase  I  at  about  the  time  of  birth  and  until  puberty. 
Following  ovulation,  the  oocyte  resumes  meiosis  I  and  halts  in  metaphase  of 
meiosis  II  that  will  be  completed  after  fertilisation  (Small wood  and  Kelsey  2012). 

Conversely,  male  GCs  do  not  initiate  meiosis  in  the  embryo  and  stop  dividing 
from  E13.5  (Western  et  al.  2008).  At  sexual  maturity,  male  GCs  will  differentiate 
into  spermatogonial  stem  cells  and  resume  mitotic  proliferation  to  form 
spermatocytes  that  will  give  rise,  following  meiosis,  to  haploid  spermatids  that 
will  develop  into  spermatozoa. 


1.3    Primordial  Germ  Cell  Development 
and  Reprogramming 

After  implantation,  epiblast  cells  mature  and  prepare  for  gastrulation  and  formation 
of  all  the  body  cell  lineages.  This  process  is  associated  with  major  epigenetic 
changes,  as  illustrated  by  the  genome-wide  increase  in  DNA  methylation  in 
pre-gastrulating  embryos  that  will  be  almost  complete  by  E6.5  (Borgel 
et  al.  2010).  Thus,  by  -E6.25,  the  prospective  PGCs  have  accumulated  several 
layers  of  epigenetic  information  and  are  already  primed  towards  a  somatic  fate. 
Upon  PGC  specification,  these  epigenetic  features  will  be  erased  through  a  major 
transcriptional  and  epigenetic  reprogramming  that  might  be  important  for  the 
production  of  a  totipotent  zygote  following  fertilisation. 
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Fig.  1.2  Temporal  schematic  of  epigenetic  reprogramming  during  mouse  primordial  germ  cell 
development.  Genome-wide  dynamics  of  DNA  methylation  and  main  histone  modifications 
during  PGC  development  (mainly  revealed  by  immunochemistry  analysis)  are  depicted.  The 
dynamic  expression  of  key  epigenetic  modifiers  and  pluripotency  factors  is  also  shown.  Based 
on  Kurimoto  et  al.  (2008a,  b);  Ancelin  et  al.  (2006);  Seki  et  al.  (2005,  2007);  Hajkova  et  al.  (2008, 
2010);  Daujat  et  al.  (2009);  and  Hackett  et  al.  (2013).  PGC:  Primordial  germ  cells,  BLIMP1: 
B -lymphocyte-induced  maturation  protein  1,  PRDM14:  PR-domain-zinc-finger  protein  14,  SOX2: 
SRY  (sex-determining  region  Y)-box  2,  KLF2:  Kruppel-like  factor  2,  OCT4  or  POU5F1:  POU 
class  5  homeobox  1,  Dnmt:  DNA  (cytosine-5)-methyltransferase,  NP95  or  UHRF1:  ubiquitin-like 
with  PHD  and  ring  finger  domains  1,  PRMT5:  protein  arginine  methyltransferase  5,  Tet: 
ten-eleven- translocation,  Aid  or  Aicda:  activation-induced  cytidine  deaminase,  Apobec:  apolipo- 
protein  B  mRNA  editing  enzyme  catalytic  polypeptide,  HIRA:  histone  cell  cycle  regulation 
defective  homolog  A,  NAP-1:  nucleosome  assembly  protein  1,  Gadd45a:  growth  arrest  and 
DNA-damage-inducible  protein  45  alpha 
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1.3.1    Primordial  Germ  Cell  Specification:  Reprogramming 
Their  Transcription  Pattern 

PGC  specification  is  associated  with  major  changes  in  gene  transcription  to  repress 
the  somatic  cell  program  and  activate  the  germ  cell- specific  program,  reacquire 
their  pluripotency  potential  and  prepare  for  the  imminent  genome-wide  epigenetic 
reprogramming.  This  highly  ordered  process  is  regulated  by  BLIMP  1  and 
PRDM14.  At  ~E6.25  these  transcriptional  regulators  co-mark  epiblast  cells  that 
will  form  PGCs  and  in  the  absence  of  either  of  these  proteins,  nascent  PGC 
precursors  fail  to  properly  develop  (Ohinata  et  al.  2005;  Yamaji  et  al.  2008;  Vincent 
et  al.  2005;  Kurimoto  et  al.  2008a,  b).  A  single-cell  microarray  approach  to 
establish  the  genome-wide  transcription  dynamics  of  developing  PGCs  and  their 
somatic  neighbours  from  E6.25  to  E8.25  revealed  that  germ  cell  specification 
involves  the  up-regulation  of  nearly  500  "germ  cell- specification"  genes  and  the 
down-regulation  of  330  "somatic  program"  genes  (Kurimoto  et  al.  2008a).  Among 
the  down-regulated  "somatic"  genes  there  are  many  genes  involved  in  embryonic 
development  (e.g.,  Hox  genes,  Dkkl,  Cdxl  . . .),  cell  cycle  regulation  (e.g.,  Ccnel, 
Cdc25a  . . .)  as  well  as  DNA  methylation  and  histone  modification,  such  as  the  de 
novo  DNA  methyltransferases  DNMT3A  and  DNMT3B,  the  nuclear  protein  of 
95  kDa  (NP95,  a  factor  essential  to  maintain  the  DNA  methylation  pattern  during 
cell  division)  and  the  H3K9me2  histone  methyltransferase  GLP  (G9a-like  protein). 
Conversely,  the  "germ  cell  specification"  category  includes  genes  associated  with 
germ  cell  development,  such  as  Stella  or  Fragilis,  and  also  the  pluripotency  genes 
Nanog,  Sox2  (Sry-box2)  and  Klf2  (Krupp el-like  factor  2)  (Kurimoto  et  al.  2008a). 

Further  analysis  conducted  using  BLIMP  1 -deficient  PGC-like  cells  showed  that 
BLIMP  1  functions  as  a  dominant  repressor  of  the  somatic  program  and  is  also 
involved  in  the  reacquisition  of  the  pluripotency  potential  and  in  the  forthcoming 
epigenetic  reprogramming.  On  the  other  hand,  PRDM14  is  required  for  Sox2 
up-regulation  and  Glp  repression  and  is  essential  for  the  reacquisition  of  the 
pluripotency  potential  and  for  epigenetic  reprogramming  (Yamaji  et  al.  2008; 
Kurimoto  et  al.  2008b).  Importantly,  BLIMP1,  although  unnecessary  to  induce 
Prdml4  expression,  is  strictly  required  for  its  maintenance  (Yamaji  et  al.  2008). 

How  precisely  these  two  proteins  regulate  germ  cell  specification  remains  to  be 
established.  Both  BLIMP  1  and  PRDM14  contain  a  zinc-finger  and  histone 
methyltransferase  SET  domains,  but  no  associated  histone-modifying  activity  has 
been  reported.  Alternatively,  they  could  exert  their  functions  by  recruiting  effector 
partners  to  their  target  genes.  BLIMP  1  can  recruit  different  chromatin-modifying 
proteins,  such  as  histone  deacetylases  (HDAC)  (Yu  et  al.  2000),  G9A  (Gyory 
et  al.  2004)  and  the  arginine  methyltransferase  PRMT5  (Ancelin  et  al.  2006). 
BLIMP1  and  PRMT5  co-localise  in  the  nuclei  of  migrating  PGCs  (Ancelin 
et  al.  2006);  however,  it  is  not  known  whether  the  putative  BLIMP  1/PRMT5 
complex  is  formed  also  during  PGC  specification  and  whether  it  contributes  to 
repression  of  the  somatic  program. 
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Upon  PGC  specification,  in  addition  to  repressing  the  DNA  methylation 
machinery  and  Glp,  PGCs  express  factors  that  are  currently  used  for  in  vitro 
somatic  cell  reprogramming.  These  include  Sox2,  Nanog,  Lin28  and  Klf2,  which 
are  specifically  up-regulated  in  PGCs,  as  well  as  Oct-4  (octamer-binding  transcrip- 
tion factor  4 )  which  is  strongly  expressed  already  in  epiblast  cells  and  throughout 
PGC  development  (Kurimoto  et  al.  2008a;  West  et  al.  2009;  Yeom  et  al.  1996) 
(Fig.  1.2).  The  expression  of  these  pluripotency  factors  might  explain  how 
monopotent  PGCs  can  develop  into  pluripotent  embryonic  germ  cells  in  culture 
(Matsui  et  al.  1992).  In  addition,  it  suggests  that  germ  cell  reprogramming  shares 
some  similarities  with  the  mechanism  underlying  the  generation  of  induced  plurip- 
otent stem  cells  (iPS).  Nonetheless,  the  role  of  these  pluripotency  factors  in  PGC 
specification  and  particularly  their  potential  reprogramming  function  remain  to  be 
determined  (Gillich  and  Hayashi  2011).  To  date,  germline- specific  knockout 
experiments  revealed  that  OCT-4  and  NANOG  are  critical  for  PGC  survival  during 
migration  (Kehler  et  al.  2004;  Chambers  et  al.  2007;  Yamaguchi  et  al.  2009). 


1.3.2    Epigenetic  Reprogramming  in  Early 
and  Late  Primordial  Germ  Cells 

When  the  germ  cell  fate  is  established  at  ~E7.25,  PGCs  display  an  epigenetic 
pattern  similar  to  that  of  the  surrounding  somatic  cells  (Seki  et  al.  2005,  2007; 
Hajkova  et  al.  2008;  Popp  et  al.  2010;  Guibert  et  al.  2012).  Epigenetic 
reprogramming  initiates  at  ~E7.5  and  will  lead  to  a  virtually  complete  loss  of 
DNA  methylation  in  the  germline  of  both  sexes  by  E13.5  (Fig.  1.2). 

Migrating  PGCs  undergo  ordered  epigenetic  changes,  leading  to  the  establish- 
ment of  a  germ  cell-specific  chromatin  signature  which  is  distinct  from  that  of  the 
somatic  neighbours.  Immunochemistry  approaches  revealed  that  in  PGCs  most 
H3K9me2,  a  repressive  mark,  is  progressively  removed  in  a  cell-by-cell  manner 
between  E7.75  and  E8.75  (Seki  et  al.  2005,  2007;  Hajkova  et  al.  2008),  while  DNA 
methylation  declines  genome- wide  (Seki  et  al.  2005,  2007).  A  recent  study  based 
on  a  genome-wide  bisulphite  sequencing  approach  further  supports  that  the  bulk  of 
methylation  erasure  occurs  prior  to  E9.5,  showing  an  average  methylation  level  at 
CpG  sites  of  -71  %  in  E6.5  epiblast  cells  and  -30  %  in  E9.5  PGCs  (Seisenberger 
et  al.  2012).  Interestingly,  some  specific  sequence  classes,  including  X-linked, 
imprinted  and  some  germline- specific  genes,  partially  escape  this  initial  loss  of 
methylation  (Seisenberger  et  al.  2012). 

Conversely,  the  level  of  H3K9me3,  which  marks  centromeric  heterochromatin, 
remains  unchanged  during  PGC  migration.  H3K27me3,  another  repressive  mark 
that  is  mediated  by  the  polycomb  repressive  complex  2  (PRC2),  is  progressively 
up-regulated  from  ~E8.25,  with  most  PGCs  showing  high  H3K27me3  levels  by 
E9.5.  Ezh2,  which  encodes  a  histone  methyltransferase  belonging  to  PRC2,  is 
concomitantly  expressed  in  PGCs  and  could  be  involved  in  this  up-regulation. 
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However,  Ezh2  is  expressed  also  in  somatic  cells  (Kurimoto  et  al.  2008a), 
suggesting  that  the  removal  of  H3K9me2  and/or  reduction  of  DNA  methylation  is 
a  prerequisite  for  H3K27me3  acquisition.  This  hypothesis  is  supported  by  the 
observation  that  Prdml4~'~  PGCs,  which  cannot  repress  Glp  and  thus  reduce 
H3K9me2,  also  fail  to  up-regulate  H3K27me3  (Yamaji  et  al.  2008).  Consequently, 
H3K27me3  enrichment  should  compensate  for  H3K9me2  reduction  and  ensure  that 
the  PGC  euchromatin  is  properly  repressed  (Seki  et  al.  2005).  The  PRMT5- 
mediated  symmetrical  arginine  methylation  at  histone  H2A  and  H4 
(H2A/H4R3m2),  a  repressive  mark  that  increases  from  E8.5  onward  through  the 
action  of  the  putative  PRDM1-PRMT5  complex  (Ancelin  et  al.  2006),  might  have  a 
similar  role.  These  dynamic  histone  modifications  highlight  that  the  euchromatic 
part  of  the  PGC  genome  is  deficient  in  repressive  marks  between  ~E7.5  and  ~E8.25. 
However,  this  "repression-free"  chromatin  state  probably  does  not  result  in 
deregulated  transcription  as  PGCs  exhibit  a  transient  loss  of  RNA  polymerase  II 
(RNA  polll)  activity  during  this  period  (Seki  et  al.  2007).  Accordingly,  enrichment 
for  histone  marks  associated  with  transcriptionally  active/permissive  chromatin, 
such  as  H3K4me2/3  and  H3K9ac,  is  observed  only  in  late-migrating  PGCs  and 
further  increases  when  they  enter  the  genital  ridges  (Seki  et  al.  2005;  Hajkova 
et  al.  2008). 

The  down-regulation  of  the  GLP  histone  methyltransferase  and  the  DNA  meth- 
ylation machinery,  which  occurs  during  PGC  specification,  probably  accounts  for 
the  dramatic  decrease  in  H3K9me2  and  DNA  methylation  observed  by  E8.75, 
through  a  passive  process.  However,  the  finding  that  most  of  the  migrating  PGCs 
stop  dividing  at  the  G2  phase  of  the  cell  cycle  (G2  arrest)  between  E7.75  and  E8.75 
indicates  that  the  loss  of  these  repressive  marks  might  also  occur  independently  of 
DNA  replication  (see  Sect.  1.4). 

These  dynamic  and  ordered  epigenetic  changes  in  migrating  PGCs  are  believed 
to  prepare  their  genome  for  the  second  wave  of  reprogramming  events  following 
their  entry  in  the  genital  ridge  at  El 0.5.  This  major  reprogramming  wave  is  first 
characterised  by  a  second  phase  of  demethylation  (Seki  et  al.  2005;  Hajkova 
et  al.  2008)  that  affects  the  sequences  that  have  retained  their  methylation  in 
migrating  PGCs  (Seisenberger  et  al.  2012),  leading  to  global  genome 
hypomethylation  in  the  germline  of  both  sexes  by  El 3.5.  Genome- wide  and 
region-specific  bisulphite-based  approaches  indeed  confirmed  the  removal  of 
DNA  methylation  at  the  majority  of  the  genome  including  coding  and  intergenic 
regions,  X-linked,  imprinted  and  germline-specific  genes  as  well  as  most  of 
transposable  elements  and  centromeric  regions  (Hajkova  et  al.  2002;  Maatouk 
et  al.  2006;  Yamagata  et  al.  2007;  Popp  et  al.  2010;  Henckel  et  al.  2011;  Guibert 
et  al.  2012;  Seisenberger  et  al.  2012),  revealing  that  methylation  erasure  in  PGCs  is 
more  complete  than  in  preimplantation  embryos  (Borgel  et  al.  2010;  Smallwood 
et  al.  2011).  Moreover,  the  onset  of  DNA  demethylation  precedes  the  massive 
chromatin  remodelling  at  ~E1 1.5  associated  with  transient  loss  of  the  linker  histone 
HI  and  loss  of  H3K27me3,  H3K64me3  and  H3K9me3  histone  modifications  that 
are  subsequently  regained  by  E12.5,  while  others,  such  as  H3K9ac  and 
H2a/H4R3me2s,  are  stably  removed  (Hajkova  et  al.  2008;  Daujat  et  al.  2009). 
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The  simultaneous  disappearance  of  several  histone  modifications  suggests  that 
chromatin  remodelling  relies  on  a  histone  replacement  process.  Accordingly, 
HIRA  and  NAP-1,  two  chaperones  involved  in  histone  replacement,  concomitantly 
accumulate  in  PGC  nuclei  (Hajkova  et  al.  2008).  Histone  demethylases  could  also 
be  involved  in  this  process.  Indeed,  in  PGCs  deficient  for  UTX  (ubiquitously 
transcribed  tetratricopeptide  repeat  gene  on  X  chromosome),  an  H3K27me3 
demethylase,  H3K27me3  does  not  transiently  decrease  and  these  cells  cannot 
develop  properly  (Mansour  et  al.  2012). 

Thus,  at  around  El 2. 5,  following  the  removal  of  DNA  methylation  and  several 
histone  modifications,  GCs  display  a  basal  and  probably  naive  epigenomic 
signature  that  is  unique  during  mammalian  development. 


1.3.3    Epigenetic-Based  Developmental  Processes 
and  Primordial  Germline  Reprogramming 

Reprogramming  erases  the  epigenetic  program  acquired  by  ~E6.25  in  the  epiblast- 
derived  PGC  precursors.  Genomic  imprinting  resetting  and  X-chromosome  reacti- 
vation in  female  GCs  are  the  two  main  hallmarks  of  this  phenomenon,  which  affects 
also  gene  expression  and  retroelement  regulation.  However,  like  for  DNA  methyl- 
ation erasure  that  occurs  in  a  stepwise  manner,  these  events  do  not  happen  all  at  the 
same  time. 

In  the  female  germline,  X-chromosome  reactivation  was  initially  believed  to 
occur  upon  colonisation  of  the  genital  ridges  (Tarn  et  al.  1994).  More  recent  studies 
revealed  that  this  event  initiates  already  in  nascent/migrating  PGCs.  It  is  triggered 
by  progressive  down-regulation  of  Xist,  a  noncoding  RNA  essential  for  X  inactiva- 
tion,  from  E7.0  to  El 0.5  (Sugimoto  and  Abe  2007)  that  leads  to  gradual  loss  of 
H3K27me3  enrichment  on  the  inactive  X  chromosome  (De  Napoles  et  al.  2007). 
Reactivation  of  X-linked  genes  is  completed  in  post-migratory  GCs  by  E14.5  (Tarn 
et  al.  1994;  Sugimoto  and  Abe  2007). 

Genomic  imprinting  resetting  is  restricted  to  post-migratory  GCs.  ICRs,  the  ex- 
acting regions  that  control  imprinted  domains,  retain  most,  though  not  all,  of  their 
DNA  methylation  marks  up  to  El 0.5  before  undergoing  rapid  demethylation  that  is 
complete  by  El 3.5  (Hajkova  et  al.  2002;  Lee  et  al.  2002;  Seisenberger  et  al.  2012; 
Kagiwada  et  al.  2012).  Elegant  studies  conducted  on  cloned  embryos  produced  from 
single  PGC  nuclei  demonstrated  that  during  this  time  window  functional  imprints  are 
lost  at  different  imprinted  loci  at  different  times.  This  timing  correlates  with  erasure  of 
DNA  methylation  at  the  associated  ICRs  (Kato  et  al.  1999;  Lee  et  al.  2002). 

Besides  the  effects  on  X-linked  and  imprinted  genes,  it  is  less  clear  whether  PGC 
reprogramming  and  the  associated  extensive  erasure  of  DNA  methylation  affect 
gene  expression  in  general.  A  recent  study  based  on  an  Me-DIP  approach  coupled 
to  microarray  analysis  showed  that  only  a  limited  number  of  promoters  are 
methylated  in  E7.5  epiblast  cells  (Guibert  et  al.  2012),  similarly  to  what  is  observed 
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in  somatic  cells  (Weber  and  Schiibeler  2007;  Mohn  et  al.  2008;  Borgel  et  al.  2010). 
Virtually,  all  these  promoters  undergo  demethylation  in  developing  PGCs,  probably 
from  E8.5  (Guibert  et  al.  2012;  Seisenberger  et  al.  2012).  Although  most  of  them 
control  genes  that  are  not  expressed  in  PGCs,  a  subset  are  associated  with 
pluripotency  and  germline- specific  factors,  suggesting  that  DNA  demethylation 
could  control  the  germline  expression  of  these  specific  transcripts  (Guibert 
et  al.  2012;  Hackett  et  al.  2012a;  Seisenberger  et  al.  2012;  Yamaguchi 
et  al.  2012).  The  recent  identification  of  a  set  of  germline- specific  genes  that  are 
apparently  primarily  regulated  by  DNA  methylation  further  supports  this  hypothe- 
sis. Consistently,  demethylation  of  their  promoters  in  migratory  or  post-migratory 
PGCs  leads  to  gene  expression  activation.  Interestingly,  this  set  of  genes,  that 
includes  Texl9.1  and  MM,  is  involved  in  the  genome  defence  mechanism  against 
parasitic  elements  (Hackett  et  al.  2012a).  Expression  in  post-migratory  PGCs  of 
other  germline- specific  genes  related  to  meiosis  and  germ  cell  functions  is  also 
concomitant  with  erasure  of  DNA  methylation  on  their  promoters  (Maatouk 
et  al.  2006;  Seisenberger  et  al.  2012;  Yamaguchi  et  al.  2012). 

Taken  together  these  findings  suggest  that  germline-specific  genes  are  activated  by 
promoter  demethylation  at  different  times  of  PGC  development,  some  during  PGC 
migration  (~E8.5)  and  others  upon  genital  ridge  colonisation  at  ~E11.0  (Maatouk 
et  al.  2006;  Guibert  et  al.  2012;  Hackett  et  al.  2012a;  Seisenberger  et  al.  2012). 
Similarly,  activation  of  the  pluripotency  genes  is  associated  with  promoter  demethyl- 
ation in  migrating  PGCs;  however  their  subsequent  repression  in  late  GCs  (by  El 6.5) 
occurs  in  a  DNA  methylation-independent  manner  (Seisenberger  et  al.  2012). 
Although  less  documented,  mainly  due  to  technical  constraints,  histone  modifications 
could  also  be  involved  in  the  control  of  germline-specific  gene  expression.  For 
instance,  Dhx38,  which  is  thought  to  be  repressed  by  PRMT5 -mediated 
H2A/H4R3me2s  in  E8.5  PGCs,  starts  to  be  expressed  at  ~E11.5,  at  the  time  of  the 
genome- wide  removal  of  this  mark  (Ancelin  et  al.  2006). 

DNA  methylation  has  a  key  role  in  repressing  the  potentially  mutagenic  tran- 
scriptional activity  of  transposable  elements,  which  make  up  about  50  %  of  the 
mammalian  genome.  Most  transposable  elements  are  retroelements  that  can  insert 
into  new  positions  in  the  genome  via  a  "copy-and-paste"  mechanism  that  involves 
RNA  intermediates  (Zamudio  and  Bourc'his  2010).  Like  the  rest  of  the  genome, 
these  sequences  undergo  DNA  methylation  erasure  in  developing  PGCs  (Popp 
et  al.  2010;  Guibert  et  al.  2012).  The  LINE  family  LI  (long  interspersed  element 
1),  for  instance,  is  demethylated  in  post-migratory  PGCs  at  ~E11.5  (Lane 
et  al.  2003).  However,  some  transposable  elements  partially  resist  to  demethylation. 
These  include  the  IAPs  and  a  young  subfamily  of  LTR-ERV1  retroelements 
(Hajkova  et  al.  2002;  Lane  et  al.  2003;  Popp  et  al.  2010;  Guibert  et  al.  2012).  It  is 
not  clear  to  which  extent  demethylated  retroelements  are  transcriptionally 
derepressed.  One  might  expect  that  to  ensure  their  propagation  at  an  evolutionary 
scale,  transposable  elements  have  to  be  expressed  in  GCs.  Intriguingly,  however,  a 
recent  study  based  on  an  RNA-seq  approach  revealed  that  demethylation  is  not 
associated  with  a  general  transcriptional  activation  of  LI  family  in  both  female  and 
male  GCs  by  El 3.5.  A  burst  of  LI  expression  is  observed  later,  at  El 6. 5,  and 
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specifically  in  female  GCs  (Seisenberger  et  al.  2012).  This  observation  suggests  that 
mechanisms  other  than  DNA  methylation  repress  LI  expression.  Among  other 
possibilities,  this  repression  could  act  at  the  post-transcriptional  level.  Genes 
involved  in  the  host  genome  defence  mechanism,  such  as  Texl9.1  and  MM,  are 
indeed  already  active  in  GCs  before  El 3.5.  Transcription  of  demethylated 
transposable  elements  might  reveal  them  to  the  genome  defence  mechanism, 
leading  to  their  repression  via  post-transcriptional  mechanisms  before  their  tran- 
scriptional silencing  in  late  GCs  (Hackett  et  al.  2012a)  (see  Sect.  1.5.3).  This  system 
could  maintain  the  genome  integrity  during  germ  cell  development  and  would 
ensure  that  all  active  transposable  elements  are  effectively  repressed. 


1.4    Mechanisms  of  DNA  Demethylation  in  PGCs 

The  mechanisms  leading  to  DNA  demethylation  in  mammals  are  the  subject  of  a 
long-lasting  debate.  The  simplest  mechanism  relies  on  the  absence  of  the  DNA 
methylation  machinery.  In  this  so-called  passive  demethylation  process,  lack  of 
DNA  methylation  maintenance  during  cell  replication  leads  to  its  progressive 
dilution  over  cell  divisions.  However,  DNA  demethylation  can  also  occur  in  a 
replication-independent  manner,  indicating  that  "active"  demethylation  exists  in 
mammalian  cells.  Different  mechanisms  can  account  for  active  DNA  demethyla- 
tion, all  involving  DNA  repair  through  the  nucleotide  excision  repair  (NER)  or  the 
base  excision  repair  (BER)  pathways  (reviewed  in  Wu  and  Zhang  2010  and  Niehrs 
and  Schafer  2012).  NER  is  a  multistep  process  that  relies  on  several  proteins  and 
involves  the  removal  of  ~30  bp  single-strand  DNA  sequence  that  includes  the 
damaged  nucleotide.  The  resulting  gap  is  filled  in  by  DNA  polymerases  and  DNA 
ligase  seals  the  nicks.  The  BER  machinery  is  a  two-step  process  in  which  a  specific 
DNA  glycosylase  recognises  and  removes  the  targeted  base  from  the  DNA;  gap 
filling  and  nick  sealing  are  ensured  by  DNA  polymerases  and  a  DNA  ligase. 

Passive  demethylation  probably  occurs  in  migrating  PGCs  (Fig.  1.3).  The 
transient  down-regulation  of  DNMT1  between  E7.5  and  E8.25  and  the  stable 
repression  of  its  cof actor  NP95,  which  is  essential  for  DNA  methylation  mainte- 
nance (Sharif  et  al.  2007),  probably  account  for  the  observed  decline  in  DNA 
methylation  (Seki  et  al.  2005).  However,  because  most  PGCs  enter  G2  arrest 
between  E7.75  and  E8.75,  active  demethylation  might  also  occur  and  could  involve 
the  growth-arrest  and  DNA-damage-inducible  protein  45  a  (GADD45a)  that 
mediates  DNA  demethylation  of  specific  sequences  through  the  NER  or  the  BER 
pathway  in  mammalian  and  non-mammalian  cells  (Niehrs  and  Schafer  2012; 
Barreto  et  al.  2007;  Schmitz  et  al.  2009).  Gadd45a  is  up-regulated  in  PGCs  at  the 
time  of  fate  determination  and  its  expression  is  possibly  maintained  after  G2  arrest 
(Mochizuki  and  Matsui  2010;  Kurimoto  et  al.  2008b).  However,  it  is  not  known 
whether  components  of  the  NER  or  the  BER  pathway  are  also  present  at  this  stage. 
In  addition,  Gadd45a~,~  mice  do  not  show  defects  in  fertility  (the  PGC  (de) 
methylation  pattern  was  not  explored  in  these  mutants)  (Engel  et  al.  2009). 
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Fig.  1.3  Possible  mechanisms  of  cytosine  demethylation  during  mouse  PGC  development. 
During  genome-wide  PGC  DNA  demethylation,  methylated  cytosines  (5mC)  are  converted  to 
cytosines  through  a  passive  or  an  active  mechanism,  each  involving  different  pathways  (see  text 
for  details).  Absence  of  the  DNA  methylation  maintenance  machinery  (i.e.,  DNMT  and  cofactors) 
or  its  inability  to  recognise  hydroxylated  derivatives  of  5mC  (i.e.,  5hmC,  5fC  and  5caC)  leads  to 
passive  dilution  of  methylation  during  cell  replication.  Active  replication-independent  demethyl- 
ation in  germ  cells  is  believed  to  involve  DNA  repair  through  the  base  excision  repair  (BER) 
pathway.  This  is  initially  triggered  by  deamination  of  5mC  or  5hmC  into  thymines  or  5hmU, 
respectively,  followed  by  thymine  excision  by  DNA  glycosylase.  Hydroxylated  derivatives  of 
5mC  can  also  be  the  target  of  active  demethylation  via  excision  by  DNA  glycosylases  followed 
by  BER.  In  the  emerging  picture  (symbolised  by  bold  arrows),  passive  demethylation  primarily 
accounts  for  the  loss  of  methylation  in  PGCs.  This  process  is  mainly  driven  by  the  absence  of  DNA 
methylation  maintenance  in  an  active  cell  proliferation  context  and  can  further  be  accentuated 
at  specific  loci  by  a  prior  Ten-Eleven-Translocation-mediated  hydroxylation.  C:  cytosine,  5mC: 
5-methylcytosine,  5caC:  5-carboxylcytosine,  5fC:  5-formylcytosine,  5hmC:  5-hydroxymethyl- 
cytosine,  5hmU:  5-hydroxymethyluracil,  AID:  activation-induced  deaminase,  APOBEC1:  apoli- 
poprotein  B  mRNA  editing  enzyme  catalytic  polypeptide  1,  G:  guanine,  MBD4:  methyl 
CpG-binding  domain  protein  4,  SMUG1:  single-strand  selective  monofunctional  uracil  DNA 
glycosylase,  T:  thymine,  TDG:  thymine  DNA  glycosylase,  TET:  Ten-Eleven-Translocation 


Both  active  (Hajkova  et  al.  2008,  2010)  and  passive  (Guibert  et  al.  2012; 
Seisenberger  et  al.  2012;  Kagiwada  et  al.  2012;  Yamaguchi  et  al.  2012)  processes 
have  been  proposed  to  account  for  the  second  DNA  demethylation  step  that  takes 
place  after  PGC  entry  in  the  genital  ridges. 

The  initial  observation  that  this  second  genome-wide  DNA  demethylation  wave 
seems  to  occur  rapidly  within  a  single  G2  phase  suggested  that  it  relies  on  an  active, 
replication-independent  mechanism  (Hajkova  et  al.  2008).  In  support  for  such  an 
active  DNA  demethylation  process,  several  components  of  the  BER,  but  not  the 
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NER,  machinery  are  active  in  PGC  nuclei  at  ~E11.5  (Hajkova  et  al.  2010). 
BER-mediated  DNA  demethylation  occurs  in  plants  where  several  glycosylases 
that  recognise  and  remove  5-methylcytosines  have  been  identified  (Gehring  and 
Henikoff  2007).  In  mammals,  however,  this  class  of  glycosylases  has  not  been 
characterised  yet.  It  is  thus  proposed  that  5-methylcytosines  are  first  deaminated 
into  thymines  and  the  resulting  T:G  mismatches  are  then  targeted  by  thymine 
glycosylases,  such  as  MBD4  (methyl  CpG-binding  domain  protein  4)  or  TDG 
(thymine-DNA  glycosylase),  that  also  activate  the  BER  machinery.  AID  and 
APOBEC1,  the  genes  which  are  expressed  in  PGCs  (Morgan  et  al.  2004;  Hajkova 
et  al.  2010),  are  possible  candidate  deaminases.  Genetic  ablation  of  Aid  revealed  that 
this  enzyme  contributes  to  DNA  demethylation  in  PGCs  (Popp  et  al.  2010).  However, 
its  role  is  limited,  as  substantial  demethylation  still  occurs  in  its  absence.  At  El 3.5, 
female  and  male  Aid~f~  GCs  show  20  and  22  %  DNA  methylation,  respectively, 
which  is  higher  than  the  8  and  16  %  observed  in  wild-type  PGCs,  but  far  from  the 
~75  %  observed  in  somatic  neighbours  (Popp  et  al.  2010).  Aid  expression  in  PGCs  is 
detected  only  from  E12.5  (Hajkova  et  al.  2010);  this  also  does  not  support  a  role  for 
this  enzyme  in  DNA  demethylation  at  El  1.5.  Moreover,  AID-  and  APOBEC1- 
mutant  mice  are  viable  and  fertile  (Popp  et  al.  2010;  Morrison  et  al.  1996).  These 
observations  question  the  involvement  of  DNA  deamination  (at  least  mediated  by 
AID  and  APOBEC-1)  in  the  massive  loss  of  DNA  methylation  observed  at  ~E1 1.5. 

The  identification  of  the  ten-eleven-translocation  (TET)  family  of 
5mC-dioxygenases  provides  an  alternative  mechanism  for  DNA  demethylation. 
TET  proteins  (TET1,  TET2  and  TET3)  convert  5-methylcytosine  (5mC)  into 
5-hydroxymethylcytosine  (5hmC)  and  further  catalyse  the  oxidation  of  5hmC  to 
5-formylcytosine  (5fC)  and  5-carboxylcytosine  (5caC)  (Tahiliani  et  al.  2009; 
Ito  et  al.  2011;  He  et  al.  2011).  Several  findings  support  the  idea  that  these 
modifications  are  cytosine  demethylation  intermediates.  Specifically,  in  the  zygote, 
the  rapid  erasure  of  DNA  methylation  from  the  male  pronucleus  coincides  with  a 
concomitant  gain  of  5hmC,  5fC  and  5caC  (Iqbal  et  al.  2011;  Wossidlo  et  al.  2011; 
Inoue  and  Zhang  2011).  Maternal  depletion  of  TET3  impairs  paternal  genome 
demethylation  in  preimplantation  embryos  and  leads  to  neonatal  lethality  (Guo 
et  al.  201 1).  A  similar  pathway  could  also  account  for  germline  DNA  demethylation. 
A  recent  study  indeed  revealed  that  loss  of  DNA  methylation  from  PGCs  at  -El 0.5 
coincides  with  a  concomitant  gain  of  5hmC  (Hackett  et  al.  2013),  likely  mediated 
by  TET1  and  TET2  that  are  expressed  in  PGCs  and  peak  at  this  specific  stage 
(Hajkova  et  al.  2010;  Hackett  et  al.  2013).  However,  if  these  two  factors  are  indeed 
involved  in  DNA  methylation  erasure,  one  might  expect  that  they  are  functionally 
redundant  (Hackett  et  al.  2013),  as  TET  1 -deficient  mice  are  fertile,  although  less  so 
than  wild  type  (Dawlaty  et  al.  2011),  and  its  absence  only  marginally  impairs 
genome- wide  DNA  demethylation  in  El 3.5  PGCs  (Yamaguchi  et  al.  2012). 

Such  TET-mediated  demethylation  can  occur  through  several  pathways 
(Fig.  1.3).  In  a  deamination-independent  mechanism,  5hmC,  5fC  and  5caC  could 
be  directly  recognised  and  removed  by  DNA  glycosylases,  followed  by  BER. 
Biochemical  approaches  have  demonstrated  that  5fC  and  5caC,  but  not  5hmC,  can 
be  removed  from  DNA  by  TDG  (Maiti  and  Drohat  2011;  Hashimoto  et  al.  2012). 
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Deamination  could  also  convert  5hmC  into  5  hydroxymethyluracil  (5hmU)  that  can  be 
removed  by  a  DNA  glycosylase  and  repaired  by  BER.  TDG  and  single- strand- 
selective  monofunctional  uracil  DNA  glycosylase  1  (SMUG1)  glycosylate  5hmU 
in  vivo  (Guo  et  al.  2011;  Cortellino  et  al.  2011).  However,  if  one  or  both  these 
scenarios  indeed  occur  in  PGCs,  the  identity  of  the  involved  deaminases  and/or 
glycosylases  remains  to  be  determined.  As  discussed  above,  AID  and  APOBEC-1 
as  well  as  TDG,  which  is  not  detected  between  E10.5  and  E13.5  (Hajkova  et  al.  2010), 
are  unlikely  to  account  for  the  massive  demethylation  at  ~E11.5.  Further 
investigations  on  potential  candidates,  including  other  deaminases  of  the  APOBEC 
family  or  glycosylases  such  as  MBD4  and  SMUG-1,  should  clarify  this  issue. 

A  probably  more  relevant  possibility  is  that  the  TET-mediated  5mC  derivatives 
are  not  recognised  by  the  maintenance  methylation  machinery,  as  documented  for 
5hmC  (Valinluck  and  Sowers  2007),  leading  to  passive  demethylation.  Recent 
reports  indeed  revealed  that  such  a  mechanism  probably  accounts  for  demethylation 
of  imprinted  and  meiotic  genes  (Hackett  et  al.  2013;  Yamaguchi  et  al.  2012). 
This  is  also  consistent  with  other  studies  supporting  that  genome-wide  DNA 
demethylation  in  PGCs  occurs  primarily  via  a  replication-coupled  passive  mecha- 
nism, mediated  by  the  absence  of  DNA  methylation  maintenance  (Kagiwada 
et  al.  2012;  Seisenberger  et  al.  2012). 

To  summarise,  the  erasure  of  DNA  methylation  in  developing  PGCs  is  likely  to 
involve  multiple  passive  and  active  mechanisms,  possibly  in  a  locus-specific 
manner.  In  the  emerging  model,  a  replication-coupled  passive  mechanism  primarily 
accounts  for  the  loss  of  methylation  in  PGCs.  This  process  is  mainly  driven  by 
the  absence  of  DNA  methylation  maintenance  in  an  active  cell  proliferation  context 
and  can  further  be  accentuated  at  specific  loci  by  a  prior  TET-mediated  hydroxyl- 
ation.  In  addition,  active  DNA  demethylation,  mediated  by  the  BER  pathway, 
could  act  as  an  auxiliary  mechanism  at  a  limited  number  of  sequences  (Fig.  1.3). 


1.5    Setting  Up  New  Epigenetic  Patterns  in  Germ  Cells 

Erasure  of  the  epigenetic  pattern  in  PGCs  is  completed  at  ~E12.5.  From  this  "naive" 
state,  the  now  so-called  GCs  start  to  differentiate  into  male  or  female  gametes.  This 
process  is  associated  with  the  acquisition  of  a  new,  sex-specific  epigenetic  profile, 
as  monitored  by  the  dynamic  of  DNA  methylation. 


1.5.1    Targets  and  Timing  of  Acquisition  of  DNA 
Methylation  in  Germ  Cells 

Mature  oocytes  and  sperms  show  distinct  DNA  methylation  patterns.  Overall,  the 
sperm  genome  is  globally  more  methylated  than  the  oocyte  genome  (Howlett  and 
Reik  1991).  A  genome-wide  bisulphite  sequencing  approach  that  virtually  analysed 
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all  cytosines  of  the  genome  showed  an  average  methylation  level  of  -90  %  in  sperm 
cells  and  of  ~40  %  in  mature  oocytes  (Kobayashi  et  al.  2012).  This  male-female 
asymmetry  is  observed  also  in  specific  genomic  regions.  Recent  exhaustive  maps  of 
cytosine  methylation  distribution  in  sperms  and  oocytes  showed  that,  differently 
from  somatic  cells,  specific  regions,  which  are  called  CpG  islands  (CGIs),  are  prone 
to  be  methylated  in  germ  cells.  In  somatic  mammalian  cells,  DNA  methylation 
occurs  almost  exclusively  at  CpG  dinucleotides  throughout  the  genome  (Auclair 
and  Weber  2012),  but  not  at  CGIs,  although  they  are  characterised  by  high  CpG 
density.  These  usually  short  genomic  regions  (from  ~200  bp  up  to  several  kb  in 
length)  are  mostly  associated  with  promoter  regions  (60  %  of  the  about  23,000 
CGIs  identified  in  the  mouse  genome)  (Small wood  and  Kelsey  2012)  and,  as  a  rule, 
they  are  not  methylated  regardless  of  the  activity  of  the  associated  promoter. 
Conversely,  -1,330  CGIs  are  specifically  methylated  in  oocytes  and  ~350  in  sperms 
(Kobayashi  et  al.  2012),  thus  representing  regions  with  germline- specific  methyla- 
tion (Borgel  et  al.  2010;  Smallwood  et  al.  2011;  Kobayashi  et  al.  2012;  Smith 
et  al.  2012).  This  feature  was  previously  described  only  for  ICRs  that  are 
characterised  as  germline-inherited  differentially  methylated  regions  (gDMRs) 
(Arnaud  2010)  and  are  a  small  minority  of  the  methylated  CGIs  (about  23  imprinted 
gDMRs  are  referenced  to  date)  (Arnaud  2010;  Proudhon  et  al.  2012).  Imprinted 
gDMRs  maintain  DNA  methylation  upon  fertilisation  and  throughout  development. 
By  this  mean,  DNA  methylation  marks  the  parental  origin  of  the  allele  in  somatic 
tissues  and  mediates  the  mono-allelic  expression  of  the  associated  imprinted 
gene(s).  By  contrast,  at  most  non-imprinted  CGIs  that  are  methylated  in  gametes, 
DNA  methylation  is  totally  or  mostly  erased  during  preimplantation 
reprogramming  and  they  do  not  display  parental-allelic  methylation  after  implanta- 
tion (Borgel  et  al.  2010;  Smallwood  et  al.  2011;  Kobayashi  et  al.  2012;  Proudhon 
et  al.  2012).  The  role  of  gamete- specific  CGI  methylation  remains  to  be  formally 
established.  Transcriptome  analysis  of  oocytes  before  and  after  acquisition  of  DNA 
methylation  failed  to  show  significant  differences,  questioning  the  role  of  CGI 
methylation  in  oocyte  gene  expression  (Smallwood  et  al.  2011).  In  addition, 
absence  of  DNA  methylation  does  not  result  in  infertility  per  se  (Kaneda 
et  al.  2004).  As  not  all  CGI- associated  DNA  methylation  is  erased  at  the  preim- 
plantation stage,  it  could  affect  the  level  of  gene  expression  in  preimplantation 
embryos  and  influence  first  lineage  specification  (Smallwood  et  al.  2011; 
Smallwood  and  Kelsey  2012). 

Besides  CGIs,  sex-specific  DNA  methylation  is  also  observed  at  transposable 
elements  that  are  highly  methylated  in  sperm  and  moderately  methylated  in  oocyte. 
For  instance  long  interspersed  elements  (LINE)  as  well  as  short  interspersed 
elements  (SINE),  LTR  retroelements  (long  terminal  repeat  retroelements)  and 
DNA  transposons  display  an  average  methylation  level  of  80-85  %  in  sperm  and 
40  %  in  oocyte  (Kobayashi  et  al.  2012). 

This  DNA  methylation  landscape  is  acquired  at  different  times  and  distinct 
cellular  contexts  in  paternal  and  maternal  GCs,  as  exemplified  by  the  dynamics  of 
methylation  at  imprinted  gDMRs  and  transposable  elements  (Fig.  1.4).  In  the  male 
germline,  parental  imprints  and  methylation  at  transposable  elements  are  acquired  in 


18 


S.  Maupetit-Mehouas  et  al. 


T 


EU.fJ     EU.O  [ISO 


it 


E7.Q      £B.G       E9.0      E10.0     El  1.0  E1J.Q 

_l  I  I  I  I  I  

Rfiettlnf  of  epipwHc  fHttem 
<f.g  «r*svr*  o<  DNA  rn«hy)  aTio^ 

RfrXthijftton  of  IfiKflw  SUhromowm* 


De/nelhya-non  of    DEn-.clhYljl  or  tlf 

piuripatErKv  ftt*A    fwieSfr  b**wi 


DNMTW3L 

piRNA  pjrttnwy  |M  Wl?  Mil  I] 
HJH  drtncthybt*  ? 

TunscripCfun-lhromfi  event? 
K3ttneJ7 


BIRTH 


£13.0  EH.Q  EISjO 
_J  L  1  


Meiaiis 
recombination 

MHD5.J 


-EZO.D 


Mei-qsis.  HutMie  Soermatoioj 

recombination  repia«m*ru 


PUBERTY 


DWMT3A/3L 

H3K*  d«n«l*YliM  4KDM10,  Mher  1) 
Transcription- tfiniugti  rant 

MMj  LSH  jgttM^M 


Methyiation  acquisition 


Meioiii  arrest ; 
dipkylenr  praphaic 1 


Follleulo&ervnis 

MeiOJji  COrnpfetion 


Fig.  1.4  Timing  and  actors  of  DNA  methylation  acquisition  in  the  developing  female  and  male 
mouse  germline.  Factors  known  to  be  involved  in  DNA  methylation  acquisition  at  single-copy 
sequences  and  repetitive  elements  are  shown.  The  DNA  methylation  machinery  is  mainly 
constituted  by  DNMT3L  and  DNMT3A  and  to  a  lesser  extent  by  DNMT3B,  which  is  required 
for  methylation  of  some  repetitive  element  families  (see  text  for  details).  The  main  germline 
developmental  stages  are  indicated 


mitotically  arrested  GCs.  Methylation  initiates  before  birth,  in  pro-spermatogonia 
at  ~E14.5,  and  is  completed  in  peri-natal  pro-spermatogonia  and  is  maintained 
through  the  meiotic  and  haploid  stages  (Davis  et  al.  2000;  Kato  et  al.  2007; 
Ichiyanagi  et  al.  2011;  Henckel  et  al.  2011).  Conversely,  maternal  imprints  and 
CGI  methylation  are  acquired  after  birth  in  growing  oocytes  arrested  in  meiotic 
prophase  I  (with  a  4n  DNA  content)  (Lucifero  et  al.  2004;  Hiura  et  al.  2006; 
Smallwood  et  al.  2011). 

Importantly,  the  successful  completion  of  meiosis  in  the  male  and,  to  a  lesser 
extent,  in  the  female  germline  is  dependent  on  the  re-establishment  of  DNA 
methylation  (see  Sect.  1.5.3). 


1.5.2    Mechanism  of  Germline  DNA  Methylation 

Establishment  at  Imprinted  gDMRs  and  CGIs 

Our  knowledge  on  the  mechanism  involved  in  DNA  methylation  acquisition  at 
imprinted  gDMRs  has  greatly  increased  in  these  last  years,  especially  for  the 
gDMRs  in  the  female  germline  (mat-gDMRs)  that  account  for  most  of  the 
imprinted  gDMRs  identified  so  far  (about  20  Mat-gDMRs  in  the  mouse)  (Arnaud 
2010;  Proudhon  et  al.  2012).  Most  are  promoters  and  all  fulfil  the  criteria  of  a  CGI, 
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suggesting  that  the  mechanism  involved  in  their  DNA  methylation  could  account 
also  for  the  methylation  of  other  not  imprinted  CGIs  in  the  genome  (Smallwood  and 
Kelsey  2012).  Pat-gDMRs  that,  unlike  mat-gDMR,  are  CpG-poor  non-promoter 
regions  are  also  treated  in  this  section.  Three  pat-gDMRs  have  been  characterised 
(7//9-DMR,  /G-DMR  and  Rasgrfl-DMR),  while  a  fourth  one,  which  was  initially 
identified  at  the  Zdbf2  locus,  remains  to  be  firmly  validated  (Kobayashi  et  al.  2009, 
2012;  Proudhon  et  al.  2012). 

The  DNA  methylation  machinery  involved  in  germline  de  novo  methylation  is 
well  characterised  (Fig.  1.4).  A  key  factor  is  DNMT3L,  a  non-catalytic  protein  that 
belongs  to  the  de  novo  methyltransferase  3  family,  which  includes  DNMT3A  and 
DNMT3B.  Biochemical  studies  demonstrated  that  DNMT3L,  by  interacting  with 
DNMT3A  or  B,  stimulates  their  methyltransferase  activity  (Chedin  et  al.  2002; 
Suetake  et  al.  2004).  These  three  factors  are  all  expressed  in  the  germline,  but  only 
DNMT3L  and  DNMT3 A  are  involved  in  the  methylation  of  mat-gDMRs  and  CGIs 
in  oocytes  (Bourc'his  et  al.  2001;  Hata  et  al.  2002;  Kaneda  et  al.  2004;  Smallwood 
et  al.  201 1;  Kobayashi  et  al.  2012).  In  the  male  germline,  DNMT3B  also  contributes 
together  with  DNMT3A  to  the  methylation  of  pat-gDMRs,  as  reported  for  the 
Rasgrfl-DMR  (Kato  et  al.  2007),  and  these  proteins  probably  catalyse  CGI  de 
novo  methylation  in  sperm  cells  as  well. 

Many  recent  studies  give  some  insights  into  how  the  DNMT3A/DNMT3L 
complex  targets  genomic  sequences  in  a  germline- specific  manner.  The  emerging 
picture  is  that  there  is  not  a  universal  mechanism  as  different  factors,  such  as  the 
KRAB  zinc  finger  protein  ZFP57,  are  specifically  involved  in  each  germline  and/or 
in  a  subset  of  genomic  targets  (for  details  see  Arnaud  2010;  Smallwood  and  Kelsey 
2012).  Nonetheless  all  seem  to  share  a  common  core  pathway  with  three  main 
features:  primary  sequence  specificity,  chromatin  configuration  and  transcriptional 
events. 

The  primary  sequence  and  more  specifically  the  spacing  between  two  CpGs  in 
the  target  sequences  have  been  the  focus  of  much  research.  Structural  analysis 
showed  that  the  DNMT3A/3L  complex  preferentially  methylates  CpGs  that  are 
8-10  bp  apart  (Jia  et  al.  2007),  suggesting  that  genomic  sequences  with  this 
periodicity  may  be  preferential  de  novo  methylation  targets.  However,  in  the 
germline,  such  a  periodicity  cannot  on  its  own  determine  which  CGI  has  to  be 
methylated  as  it  is  found  at  most  CGIs,  regardless  of  their  methylation  status, 
in  mouse  oocytes  (Ferguson- Smith  and  Greally  2007;  Smallwood  et  al.  2011). 
In  addition,  sequences  that  do  not  possess  this  8-10  CpG  spacing,  such  as  the 
pat-gDMRs,  can  also  be  de  novo  methylated  in  the  germline. 

The  chromatin  configuration,  particularly  the  histone  marks  at  lysine  4  and  36  of 
histone  H3  (H3K4  and  H3K36,  respectively),  are  probably  more  relevant  for 
germline  de  novo  DNA  methylation  acquisition.  Biochemical  approaches  have 
shown  that  DNMT3L  and  DNMT3A  interact  with  histone  H3  only  when  it  is 
unmethylated  on  lysine  4,  suggesting  that  genomic  sequences  enriched  for 
H3K4me  cannot  recruit  the  de  novo  methylation  machinery  (Ooi  et  al.  2007; 
Zhang  et  al.  2010).  This  is  in  agreement  with  the  observation  that  DNA  methylation 
and  H3K4me2/3  are  never  associated  in  mammalian  genomes  (Barski  et  al.  2007; 
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Mikkelsen  et  al.  2007;  Mohn  et  al.  2008)  and  that  methylated  CGIs  are  depleted  of 
H3K4me3  in  oocytes  (Smallwood  et  al.  2011).  The  hypothesis  that  H3K4  methyla- 
tion  has  to  be  removed  to  allow  DNA  methylation  is  indirectly  supported  by 
functional  evidence.  In  oocytes  deficient  for  KDM1B  (an  H3K4mel/2  demethylase), 
a  subset  of  mat-gDMRs  failed  to  acquire  DNA  methylation  (Ciccone  et  al.  2009). 
Although  DNA  methylation  at  non-imprinted  CGIs  was  not  assessed  in  this 
study,  one  can  predict  that  removal  of  H3K4me,  by  KDM1B  or  other  H3K4me 
demethylases,  is  involved  in  germline  DNA  methylation  establishment  both  at 
imprinted  gDMRs  and  non-imprinted  CGIs.  Besides  H3K4me,  H3K36  methylation 
may  also  play  a  role  in  the  de  novo  methylation  mechanism.  In  vitro,  DNMT3  A,  but 
not  DNMT3B,  interacts  with  H3K36me3  (Dhayalan  et  al.  2010),  suggesting  that 
this  mark  could  recruit  the  DNMT3A/DNMT3L  complex  in  vivo.  Moreover, 
H3K36me3  association  with  transcriptionally  active  regions  in  mammalian 
genomes  (Barski  et  al.  2007;  Mikkelsen  et  al.  2007)  provides  a  mechanistic  link 
between  transcription  and  germline  DNA  methylation  establishment.  Evidence  for 
this  comes  from  a  study  conducted  on  the  mouse  Gnas  locus  that  contains  two 
mat-gDMRs  (Coombes  et  al.  2003).  Truncation  of  Nesp,  which  initiates  upstream 
and  overlaps  with  the  two  Gnas  mat-gDMRs,  disrupts  the  acquisition  of  DNA 
methylation  at  these  regions  in  oocytes  (Chotalia  et  al.  2009).  Similarly,  in  the 
human  GNAS  locus,  absence  of  DNA  methylation  at  the  mat-gDMRs  correlates 
with  deletion  of  the  NESP  promoter  region  (Bastepe  et  al.  2005).  These  findings 
suggest  that  transcription  across  gDMRs  and  CGIs  is  mechanistically  involved  in  the 
establishment  of  their  DNA  methylation  in  the  germline.  Consistent  with  this  model, 
most  mat-gDMRs  and  CGIs  that  are  methylated  in  oocytes  are  located  in  intragenic 
regions  (Smallwood  et  al.  2011;  Kobayashi  et  al.  2012;  Chotalia  et  al.  2009). 
Moreover,  transcription  across  several  mat-gDMRs  is  observed  in  growing  oocytes, 
when  methylation  is  acquired  (Chotalia  et  al.  2009).  Similarly,  transcription  across 
the  H19-DMR  and  /G-DMR  is  detected  in  E15.5  and  E17.5  pro- spermatogonia, 
concomitantly  with  the  acquisition  of  DNA  methylation  at  these  pat-gDMRs, 
suggesting  that  this  mechanism  could  also  play  a  role  in  the  paternal  germline 
(Henckel  et  al.  2011).  Thus,  transcription  and  the  associated  deposition  of 
H3K36me3  could  "open"  the  chromatin  and  facilitate  the  recruitment  of  the  DNA 
methylation  machinery. 

Therefore,  in  the  emerging  model,  transcriptional  read-through  event,  removal 
of  H3K4me  and  gain  of  H3K36me3  act  in  a  concerted  manner  to  recruit  DNMT3A 
and  DNMT3L  (Fig.  1.5).  The  exact  temporal  relationship  and  interdependence  of 
these  events  remain  to  be  formally  established.  For  instance,  it  would  be  important 
to  determine  whether  transcriptional  read-through  and  removal  of  H3K4me  are 
functionally  linked  or  independent  events.  The  observation  that  in  human  cells 
KDM1B  is  complexed  with  factors  involved  in  transcription  elongation  favours  the 
first  hypothesis  (Fang  et  al.  2010). 

Any  model  to  explain  how  de  novo  DNA  methylation  occurs  must  also  take 
into  account  the  fact  that  most  CGIs  remain  unmethylated  in  the  germline.  In 
pro-spermatogonia,  when  pat-gDMRs  acquire  their  DNA  methylation, 
mat-gDMRs  are  characterised  by  active  promoters  and  are  enriched  for 
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Fig.  1.5  Model  of  germline  DNA  methylation  establishments.  Erasure  of  H3K4me  associated 
with  transcription-through  events  is  required  for  de  novo  methylation  (upper  part).  In  this  model 
H3K36me3  occurs  subsequently,  but  the  exact  temporal  relationship  and  interdependence  between 
these  three  events  remain  to  be  established.  Conversely,  promoter  activity  could  prevent  H3K4me 
erasure  and  recruitment  of  the  DNA  methylation  machinery  (lower  part).  K4me2/3:  Di-  or 
trimethylated  lysine  4  of  histone  3.  K4me0:  Unmethylated  lysine  4  of  histone  H3.  K36me3: 
Trimethylated  lysine  36  of  histone  3.  RNA  polll:  RNA  polymerase  II 


H3K4me3  (Henckel  et  al.  2011).  This  observation  suggests  that,  in  the  germline, 
imprinted  gDMRs  and  CGIs  that  are  also  active  promoters  are  protected  from  DNA 
methylation  (Fig.  1.5).  Promoter  activity  could  impair  the  association  of  a  H3K4 
demethylase,  thus  preventing  the  recruitment  of  the  DNA  methylation  machinery. 

This  model  can  explain  methylation  at  imprinted  and  non-imprinted  CGIs  in 
oocytes  and  at  the  paternally  methylated  H19-DMR  and  /G-DMR.  However, 
methylation  in  the  male  germline  can  also  rely  on  small  RNA-based  pathway. 
This,  which  is  well  documented  at  transposable  elements  (see  Sect.  1.5.3),  is  also 
observed  at  the  Rasgrfl-DMR  where  acquisition  of  DNA  methylation  involves 
small  RNAs  and  depends  on  the  expression  of  components  of  the  PlWI-interacting 
RNA  (piRNA)  pathway  (Watanabe  et  al.  2011). 


1.5.3    Mechanism  of  Germline  DNA  Methylation 
Establishment  at  Transposable  Elements 

DNMT3L  is  the  core  component  of  the  de  novo  DNA  methylation  machinery 
at  transposable  elements.  In  DnmtSL'^  oocytes  or  pro-spermatogonia,  all 
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transposable  element  families  show  reduced  DNA  methylation  (Bourc'his  and 
Bestor  2004;  Webster  et  al.  2005;  Kato  et  al.  2007;  Kaneda  et  al.  2010;  Kobayashi 
et  al.  2012).  DNMT3A  and  DNMT3B,  which  are  also  involved  in  this  process,  have 
common  and  differential  target  specificity.  DNMT3A  methylates  SINE  Bl  whereas 
both  DNMT3A  and  B  are  required  for  IAP  methylation.  LINE  methylation  relies  on 
DNMT3A  in  the  female  germline  and  both  DNMT3A  and  B  in  the  male  germline 
(Kato  et  al.  2007;  Kaneda  et  al.  2010;  Ichiyanagi  et  al.  2011).  DNMT3A  and/or 
DNMT3B  could  also  act  independently  of  DNMT3L,  particularly  at  S1NEB1  in  the 
male  germline  and  at  retroelements  with  high  CpG  density  in  oocytes  (Ichiyanagi 
et  al.  2011;  Kobayashi  et  al.  2012). 

In  the  male  germline,  Dnmt3L  genetic  ablation  induced  transcriptional  reactiva- 
tion of  IAP  and  LINE  retrotransposons  and  caused  severe  developmental  defects 
characterised  by  meiotic  failure  (possibly  due  to  illegitimate  recombination 
between  aberrantly  unmethylated  non-homologous  retrotransposons),  progressive 
loss  of  germ  cells  and  ultimately  arrest  of  spermatogenesis  with  complete  azoo- 
spermia (Bourc'his  and  Bestor  2004;  Webster  et  al.  2005;  Hata  et  al.  2006). 
A  similar  phenotype  was  also  reported  in  Dnmt3a  conditional  mutant  males  (Kaneda 
et  al.  2004)  and  is  reminiscent  of  the  defects  observed  in  mutants  for  components 
of  the  piRNA/PIWI  pathway  (reviewed  in  Zamudio  and  Bourc'his  2010). 

PiRNAs  are  a  class  of  small  RNAs  of  -24-30  nucleotides  in  length  that  are 
specifically  expressed  in  the  germline  of  various  vertebrate  and  invertebrate  species 
(Banisch  et  al.  2012).  They  associate  with  the  PIWI  subfamily  of  Argonaute 
proteins  (Girard  et  al.  2006).  MILI  and  MIWI2,  two  of  the  three  mouse  PIWI 
family  members,  are  expressed  in  foetal  testes  where  most  piRNAs  are  derived 
from  retrotransposon  elements  (Aravin  et  al.  2006;  Kuramochi-Miyagawa 
et  al.  2008).  MILI  is  expressed  in  male  GCs  from  E12.5  to  the  round  spermatid 
stage  and  MIWI2  from  E15.5  until  birth.  In  MILI-  and  MIWI2-deficient  foetal 
testes,  a  marked  decrease  in  piRNA  production  correlates  with  defective  de  novo 
methylation  and  derepression  of  LI  and  IAP  elements  (Carmell  et  al.  2007; 
Kuramochi-Miyagawa  et  al.  2008;  Aravin  et  al.  2008).  Like  Dnmt3L~f~  mice, 
these  mutants  cannot  complete  spermatogenesis  and  are  infertile.  A  similar  pheno- 
type was  observed  also  in  mice  in  which  components  of  the  piRNA/PIWI  pathway 
were  ablated,  such  as  the  tudor  domain-containing  proteins  TDRD1  and  TDRD9  or 
the  RNA  helicase  MVH  (mouse  vasa  homolog)  (Zamudio  and  Bourc'his  2010; 
Kuramochi-Miyagawa  et  al.  2010).  Altogether,  these  observations  suggest  that  in 
foetal  testes,  retrotransposon-derived  piRNAs  play  a  pivotal  role  in  de  novo  DNA 
methylation  and  silencing.  According  to  the  current  working  model,  following 
erasure  of  DNA  methylation,  retroelements  are  expressed  from  dispersed  loci  in 
late  PGCs.  The  occurrence  of  bidirectional  transcription  across  some  loci  would 
lead  to  the  production  of  both  sense  and  antisense  retrotransposon-containing 
transcripts.  These  transcripts  will  be  sensed  by  the  PIWI  proteins  and  cleaved 
into  primary  sense  and  secondary  antisense  piRNAs,  which  will  associate,  respec- 
tively, with  MILI  and  MIWI2  in  distinct  cytoplasmic  compartments.  Exchange  of 
piRNAs  between  these  cytoplasmic  compartments  triggers  an  auto-amplification 
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process  that  increases  the  pool  of  sense  and  antisense  piRNAs  and  might  lead  to 
post-transcriptional  silencing  of  retrotransposons.  Translocation  of  MIWI2  and  the 
associated  antisense  piRNAs  to  the  nucleus  promote  de  novo  DNA  methylation  of 
complementary  genomic  copies  of  retrotransposons.  The  underlying  molecular 
mechanism  remains  unclear.  Interactome  approaches  failed  to  reveal  a  physical 
interaction  between  MIWI2  and  the  DNA  methylation  machinery,  suggesting  that 
an  intermediate  step  is  required  to  recruit  DNMTs  (Castaneda  et  al.  201 1).  As  in  the 
model  of  de  novo  methylation  at  single  copy  CGI  sequences  (Fig.  1.5),  the  MIWI2/ 
secondary  piRNA  complex  might  alter  the  histone  modification  signature  at 
retrotransposons,  for  instance  by  recruiting  an  H3K4me  demethylase  that  subse- 
quently facilitates  the  recruitment  of  the  DNA  methylation  machinery. 

The  mechanism  that  controls  transposable  element  DNA  methylation  in  the 
female  germline  is  less  well  characterised.  It  is  not  known  whether  alteration  of 
DNA  methylation  at  transposable  elements  in  Dnmt3a~f~  or  DnmtSV1-  oocytes  is 
associated  with  increased  transcription.  Nonetheless  these  oocytes  are  apparently 
normal  and  support  fertilisation  (although  the  derived  embryos  die  in  utero) 
(Bourc'his  et  al.  2001;  Hata  et  al.  2002;  Kaneda  et  al.  2004;  Kobayashi 
et  al.  2012).  PiRNAs  and  small  interfering  RNAs  (siRNAs),  these  latter  being 
specifically  derived  from  transposable  elements,  are  also  abundant  in  developing 
mouse  oocytes  at  the  time  of  DNA  methylation  establishment  (Watanabe 
et  al.  2006,  2008,  2011).  However,  transposable  element  expression  level  is  barely 
affected  in  oocytes  deficient  for  MILI  and  DICER  and  Mili~'~  and  Miwi2~f~ 
female  mice  are  fertile  (Watanabe  et  al.  2006;  Shoji  et  al.  2009).  Although  it  is 
not  known  whether  in  these  mutants  DNA  methylation  at  transposable  elements  is 
affected,  these  observations  suggest  that  in  the  female  germline  the  RNA-based 
pathway  is  not  involved  in  the  acquisition  of  a  repressive  signature  at  transposable 
elements.  Indeed,  a  study  that  focused  on  lymphoid- specific  helicase  (LSH)  rather 
supports  a  chromatin-based  mechanism.  This  member  of  the  SNF2-helicase  family 
of  chromatin  remodelling  proteins  is  involved  in  DNA  methylation  and  transcrip- 
tional repression  of  transposable  elements  in  mammalian  somatic  cells  (Dennis 
et  al.  2001).  Interestingly,  Lsh~f~  female  mice  also  show  demethylation  of  IAP 
elements  in  pachytene  oocytes,  increased  expression  of  such  elements  in  LshT1- 
ovaries  and  methylation  defects  in  pericentromeric  satellite  repeats  (De  la  Fuente 
et  al.  2006).  This  phenotype  is  associated  with  incomplete  synapsis  of  homologous 
chromosomes  and  severe  oocyte  loss  in  the  early  postnatal  stages,  suggesting  that 
LSH  is  required  to  complete  female  meiosis  (De  la  Fuente  et  al.  2006).  Whether 
LSH  also  controls  methylation  at  other  transposable  elements  and  whether  it  is 
involved  in  the  acquisition  and/or  maintenance  of  DNA  methylation  in  the  female 
germline  is  not  known. 

The  sex-specific  differences  in  DNA  methylation  establishment  at  transposable 
elements  could  be  explained  by  the  fact  that  the  female  germline  might  tolerate  the 
presence  of  transposable  element-derived  transcripts  and  thus  their  repression  is  not 
so  crucial  (Aravin  and  Bourc'his  2008;  Zamudio  and  Bourc'his  2010).  Indeed,  as 
active  cell  division  facilitates  (retro-)transposition  events,  they  are  thus  less  likely 
to  occur  in  cell  cycle-arrested  growing  oocytes. 
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1.5.4    Chromatin  Changes  During  Gamete  Maturation 

During  the  last  step  of  maturation,  gametes  undergo  developmental  changes  to 
sustain  fertilisation  and  the  first  stages  of  zygotic  development  (Sasaki  and  Matsui 
2008).  This  is  predominantly  observed  in  the  male  germline  where  post-meiotic 
spermiogenesis  is  associated  with  global  chromatin  remodelling  (reviewed  in  Kota 
and  Feil  2010).  The  main  outcome  of  this  process  is  the  exchange  of  canonical 
histones  for  protamines  (small  basic  proteins),  resulting  in  tightly  condensed  sperm 
DNA.  The  compacted  genome  in  the  sperm  head  is  believed  to  facilitate  its  motility 
and  protect  from  DNA  damage  (Jenkins  and  Carrell  2012).  This  event  questioned 
whether  and  how  male  gamete  could  transmit  their  epigenetic  information  encoded 
in  histone  and  associated  modifications.  The  answer  came  with  the  finding  that  in 
human  mature  spermatozoa,  5-15  %  of  their  genome  remains  nucleosome  bound 
(about  1  %  in  the  mouse)  (Wykes  and  Krawetz  2003;  Hammoud  et  al.  2009; 
Brykczynska  et  al.  2010).  Histone  retention  is  not  random  and  it  is  found,  for 
instance,  at  the  regulatory  regions  of  key  developmental  genes,  including  imprinted 
genes,  microRNAs  and  homeotic  genes.  Interestingly,  the  promoters  of  genes 
involved  in  spermatogenesis  and  cell  homeostasis  are  enriched  for  H3K4me2/3 
(permissive  histone  modifications),  in  agreement  with  their  activation  during  game- 
togenesis.  In  contrast,  key  genes  for  embryonic  development  or  morphogenesis 
harbour  the  repressive  H3K27me3  and  eventually,  the  permissive  H3K4me2  marks. 
These  so-called  bivalent  chromatin  domains,  initially  identified  in  embryonic  stem 
cells,  are  believed  to  poise  genes  for  either  activation  or  repression  later  in  devel- 
opment (Bernstein  et  al.  2006).  Altogether,  this  suggests  that  male  gametes  can 
convey  instructive  epigenetic  information  to  the  zygote,  which  can  subsequently 
regulate  expression  of  key  embryonic  developmental  genes. 

Alterations  in  histone  retention  associated  with  moderate  changes  in  the  amount 
of  H3K4me2/3  and  H3K27me3  at  some  developmental  genes  and  imprinted  loci 
have  been  detected  in  sperm  of  infertile  men.  These  defects  were  also  associated 
with  DNA  methylation  alterations,  though  to  a  limited  extent  (Hammoud 
et  al.  2011),  as  previously  reported  for  imprinted  genes  (Marques  et  al.  2004, 
2008;  Kobayashi  et  al.  2007).  These  findings  suggest  that  subtle,  naturally  occur- 
ring changes  in  the  chromatin  signature  of  spermatozoa  could  account  for  the  intra- 
and  inter-individual  differences  in  DNA  methylation  detected  in  human  sperm 
samples  (Flanagan  et  al.  2006)  and  might  thus  potentially  contribute  to  the  progeny 
phenotypic  differences. 


1.6    Escaping  the  Germline  Epigenetic  Reprogramming 

Germline  reprogramming  affects  all  the  epigenetic-based  developmental  processes. 
During  PGC  development,  the  epigenetic  program  of  epiblast  cells  is  fully  erased 
and  at  ~E12.5,  the  germ  cell  genome  reaches  a  basal  (virtually  epigenetic-free)  state. 
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In  addition  to  prevent  that  epimutations  pass  on  to  the  next  generation,  this  event  is 
also  required  for  the  totipotency  potential  of  the  future  gametes  (Hackett 
et  al.  2012b).  The  epigenetic-naive  germ  cell  genome  will  then  acquire  a 
germline- specific  epigenetic  signature  that,  in  addition  to  be  important  for  gameto- 
genesis  itself,  can  be  transmitted  to  and  interpreted  by  the  progeny.  This  is 
illustrated,  for  instance,  by  genomic  imprinting.  The  paternal  and  maternal  imprints 
inherited  from  epiblast  cells  are  erased  in  PGCs  and  a  new  set  of  sex- specific 
imprints  is  acquired  by  the  developing  germlines.  This  epigenetic  signature  is 
faithfully  maintained  following  fertilisation  and  is  transmitted  and  interpreted  by 
the  somatic  cell  lineages  of  the  new  individual. 

Besides  this  well-characterised  form  of  transmission  of  epigenetic  information 
from  one  generation  to  the  next  one,  the  recurrent  observation  that  some 
transposable  elements  can  escape  DNA  methylation  reprogramming  suggests  that 
epigenetic  inheritance  might  also  occur  across  several  generations.  IAPs  and  the 
subfamily  of  LTR-ERV1  retroelements,  which  have  been  shown  to  resist  both 
germline  and  preimplantation  reprogramming,  are  the  prototype  of  such 
transgenerational  inheritance  of  an  epigenetic  state  that  could  affect  also 
neighbouring  sequences  (Hajkova  et  al.  2002;  Lane  et  al.  2003;  Popp  et  al.  2010; 
Guibert  et  al.  2012;  Kobayashi  et  al.  2012).  This  is  in  line  with  the  few  known  cases 
of  transgenerational  epimutation  in  mammals  that  involve  IAP  elements  and  can 
alter  the  expression  of  neighbouring  genes  in  a  variegated  manner  (Daxinger  and 
Whitelaw  2012). 

Besides  transposable  element,  recent  genome-wide  analyses  identified  few 
single-copy  sequences  that  resist  both  waves  of  DNA  methylation  erasure  in 
PGCs  and  preimplantation  embryos  (Small wood  et  al.  2011;  Guibert  et  al.  2012). 
In  a  pioneer  study  about  20  of  such  sequences,  including  intergenic,  intragenic  and 
promoter  regions,  have  been  formally  identified  (Guibert  et  al.  2012)  and  others 
remain  to  be  revealed  (Seisenberger  et  al.  2012;  Hackett  et  al.  2013);  however,  their 
function  (if  any),  particularly  during  embryo  development,  remains  to  be 
established.  Nonetheless  these  observations  indicate  that  the  DNA  methylation 
pattern  at  single-copy  sequences  can  be  transmitted  across  generations. 

Several  lines  of  evidence  suggest  that  besides  DNA  methylation,  the  epigenetic 
signature  is  not  fully  erased  in  PGCs.  Indeed,  in  the  developing  germline,  the  timing 
of  methylation  acquisition  at  imprinted  gDMRs  can  be  different  at  the  two  alleles 
according  to  their  parental  origin.  For  instance,  in  male  germ  cells,  methylation  at 
the  H19-DMR  is  first  reacquired  on  the  paternally  inherited  allele  (Davis 
et  al.  2000).  Similarly,  in  the  female  germline  several  mat-gDMRs  first  acquire 
methylation  on  the  maternally  inherited  allele  (Lucifero  et  al.  2004;  Hiura 
et  al.  2006).  This  suggests  that  following  DNA  methylation  erasure,  alleles  can 
nevertheless  remember  their  parental  origin,  possibly  through  other  epigenetic 
modifications.  As  a  mirror  event,  it  is  also  documented  that  defects  in  the  acquisi- 
tion of  germline  methylation  at  imprinted  mat-gDMRs  can  be  partially  rescued 
during  embryonic  development.  This  is  mainly  observed  at  the  Snrpn-DMR  in 
embryos  derived  from  Dnmt3L~,~  or  Zfp57~/~  oocytes.  Although  DNMT3L 
and  ZFP57  are  crucial  for  its  oocyte  methylation,  in  some  of  these  embryos 
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the  Snrpn-DMR  is  methylated  on  the  maternal  allele  (Arnaud  et  al.  2006; 
Li  et  al.  2008).  This  suggests  that  a  yet-to-be-identified  germline-derived  signature 
escapes  preimplantation  reprogramming  and  mediates  parental  allele- specific  DNA 
methylation  acquisition  during  early  embryonic  development. 

Although  these  observations  are  not  associated  with  transgenerational  epigenetic 
inheritance  per  se,  they  provide  the  proof  of  concept  that  an  epigenetic  DNA 
methylation-independent  signature  could  be  transmitted  across  generations. 


1.7  Conclusions 


These  last  years  witnessed  major  breakthroughs  in  our  understanding  of  the 
mechanisms  involved  in  germline  reprogramming,  especially  on  how  the  new 
epigenetic  signature  is  acquired.  The  emergence  of  dedicated  deep -sequencing 
technologies  that  allowed  the  generation  of  an  exhaustive  map  of  DNA  methylation 
from  a  limited  amount  of  genomic  DNA  has  been  instrumental  in  these  advances. 
The  use  of  these  technologies  to  assess  migrating  and  early-post-migrating  PGCs, 
combined  with  the  recently  developed  genome-wide  approaches  to  profile  5hmC 
(Schuler  and  Miller  2012),  should  provide,  in  the  very  near  future,  key  information 
on  the  kinetics,  extent  and  mechanisms  of  germline  DNA  methylation  erasure. 
These  data  will  be  important  for  distinguishing  between  passive  and  active  demeth- 
ylation  mechanisms,  establishing  the  role  of  hydroxylated  5mC  derivatives  in  this 
process  and  revealing  the  nature  and  identity  of  all  the  genomic  sequences  that 
resist  erasure.  However,  DNA  methylation  is  not  the  only  component  of  the 
epigenetic  signature.  Study  of  germline  reprogramming  should  also  rely  on  the 
optimisation  of  "scaled-down"  approaches  to  determine  in  vivo  the  dynamics  of 
histone  modifications  (by  Chip-seq)  and  transcription  events  (by  RNA-seq)  in  the 
developing  germline.  Besides  this  highly  demanding  and  challenging  in  vivo 
approaches,  the  development  of  cell  culture  models  should  provide  a  versatile 
tool  to  analyse  the  molecular  events  of  germ  cell  development  and  reprogramming. 
The  improvement  of  the  already  existing  in  vitro  or  ex  vivo  models  towards  a  model 
that  faithfully  recapitulates  germ  cell  development  will  provide  a  powerful  con- 
trolled system  to  monitor  all  reprogramming  steps  (Gillich  and  Hayashi  2011). 

These  efforts  should  lead  to  the  identification  of  the  key  factors  involved  in  the 
resetting  and  (re)acquisition  of  the  epigenetic  pattern.  Besides  their  fundamental 
importance  for  our  understanding  of  epigenetic  inheritance,  this  knowledge  will 
have  major  clinical  impacts  with  potential  applications  in  reproductive  and  regen- 
erative medicine.  In  addition,  it  will  help  tackling  the  fascinating  question  of 
whether  the  environment  can  influence  reprogramming  between  generations.  As 
most  of  the  germline  reprogramming  occurs  during  embryonic  life  in  the  womb, 
one  might  speculate  that  environmental  factors,  to  which  mothers  are  exposed 
during  pregnancy,  could  affect  the  phenotype  of  not  only  her  children  but  also 
her  grandchildren. 
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Chapter  2 

Establishment  of  Tissue-Specific  Epigenetic 
States  During  Development 

Ionel  Sandovici 


Abstract  Complex  organisms  require  tissue-specific  transcriptional  programs, 
which  are  acquired  during  development  through  the  stepwise  activation  of  tran- 
scription factor  networks  acting  in  tight  coordination  with  epigenetic  mechanisms. 
Recent  progresses  in  genome-wide  mapping  of  various  epigenetic  marks  across  a 
panel  of  mammalian  cell  types  and  developmental  stages,  together  with  a  multitude 
of  functional  analyses,  led  to  significant  advances  in  our  understanding  of  tissue- 
specific  epigenetic  regulation  of  gene  expression.  These  new  developments  open  at 
last  the  opportunity  to  systematically  explore  the  contribution  of  epigenetics  to 
human  disease. 


2.1  Introduction 


Complex  organisms  such  as  mammals  comprise  over  200  different  cell  types,  each 
one  expressing  specific  sets  of  genes  that  define  their  unique  functions  (Alberts 
et  al.  2002).  The  tissue-specific  transcriptional  programs  are  acquired  over  the 
course  of  development,  during  which  cells  transit  from  a  pluripotent  state  to 
differentiated  cell  lineages,  in  a  well-orchestrated  spatial  and  temporal  manner. 
This  stepwise  process  is  controlled  by  the  sequential  activation  of  specific  tran- 
scription factors  acting  coordinately  with  epigenetic  mechanisms  (Reik  et  al.  2001; 
Hemberger  et  al.  2009;  Albert  and  Peters  2009),  which  particularly  target  key  DNA 
regulatory  sequences  such  as  promoters,  enhancers,  and  insulators  (Maston 
et  al.  2006). 
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Epigenetics  refers  to  heritable  changes  in  gene  expression,  caused  by 
mechanisms  other  than  changes  in  the  underlying  DNA  sequence  (Bird  2007). 
These  heritable  changes  in  gene  expression  are  brought  about  by  a  complex  array 
of  reversible  epigenetic  marks:  DNA  modifications  (such  as  5-methylcytosine 
(Suzuki  and  Bird  2008),  5-hydroxymethylcytosine  (Tahiliani  et  al.  2009), 
5-formylcytosine,  and  5-carboxylcytosine  (Ito  et  al.  2011)),  posttranslational 
modifications  of  histones  (such  as  methylation,  acetylation,  phosphorylation, 
ubiquitination,  sumoylation,  ADP  ribosylation,  deimination,  proline  isomerization) 
(Bannister  and  Kouzarides  201 1),  histone  variants  (Talbert  and  Henikoff  2010),  and 
alternative  nucleosome  positioning  (Bai  and  Morozov  2010).  The  epigenetic  marks 
are  laid  on  the  chromatin  by  an  array  of  chromatin-  and  DNA-binding  proteins  with 
enzymatic  activities  (Brenner  and  Fuks  2007),  as  well  as  noncoding  RNAs  (Grewal 
2010),  all  of  which  act  as  epigenetic  initiators  (epigenators)  (Berger  et  al.  2009). 

In  the  past  few  years  significant  advances  in  our  understanding  of  tissue-specific 
epigenetic  regulation  of  gene  expression  have  been  made  possible  by  loss-of- 
f unction  studies,  as  well  as  genome- wide  mapping  of  different  epigenetic  marks 
across  a  panel  of  mammalian  cell  types  and  developmental  stages.  A  seminal 
contribution  in  this  direction  has  been  provided  by  the  Encyclopedia  of  DNA 
Elements  (ENCODE)  project.  Very  recently,  the  human  ENCODE  project 
has  achieved  the  systematic  characterization  of  a  large  variety  of  epigenetic 
marks  in  147  different  cell  types  (ENCODE  Project  Consortium  2012).  Moreover, 
the  integration  of  ENCODE  data  with  other  resources  such  as  the  genome-wide 
association  studies  (GWAS)  has  started  to  reveal  new  insights  into  human  disease 
(Maurano  et  al.  2012).  In  this  chapter  I  summarize  the  current  view  on 
the  establishment  of  tissue-specific  epigenetic  marks  during  development,  how 
these  epigenetic  patterns  are  correlated  with  transcription  in  a  cell- specific  manner, 
and  how  the  tissue-specific  epigenetic  states  may  be  directly  linked  with  human 
disease. 


2.2    Epigenetic  Reprogramming  During  Preimplantation 
Development 

The  life  of  an  organism  begins  at  fertilization  with  the  formation  of  the  zygote. 
Fertilization  coincides  with  a  wave  of  epigenetic  reprogramming  (Fig.  2.1)  that  is 
required  for  the  achievement  of  developmental  totipotency  (Reik  et  al.  2001; 
Hemberger  et  al.  2009;  Albert  and  Peters  2009).  Interestingly,  even  before  fertili- 
zation the  oocyte  exhibits  global  hypomethylation,  particularly  at  specific  families 
of  long  interspersed  element  1  (LINE1)  and  long  terminal  repeat  (LTR) 
retroelements  (Smith  et  al.  2012).  A  major  initial  event  in  the  post-fertilization 
reprogramming  process  is  the  active  loss  of  DNA  methylation  in  the  paternal 
pronucleus  (Santos  et  al.  2002),  likely  by  partial  conversion  of  5-methylcytosine 
into  5-hydroxymethylcytosine  by  TET3  (ten-eleven  translocation)  protein 
(Wossidlo  et  al.  2011;  Gu  et  al.  2011).  This  is  followed  by  passive  DNA 
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Fig.  2.1  Epigenetic  reprogramming  in  early  embryos  (see  text  for  details),  (a)  Diagram  of  the  first 
events  during  the  preimplantation  development.  XEN — extraembryonic  endoderm  stem  cells, 
ES — embryonic  stem  cells,  TE — trophectoderm,  TS — trophoblast  stem  cells,  (b)  Dynamics  of 
DNA  modification  changes  during  early  development:  blue — paternal  pronucleus;  red — maternal 
pronucleus.  5mC — 5-methylcytosine,  5hmC — 5-hydroxymethylcytosine.  (c)  Histone  variants  and 
histone  modification  dynamics  during  early  development  (adapted  from  Hemberger  et  al.  (2009); 
Albert  and  Peters  (2009)).  Prm — protamines;  H3.1,  H3.2,  H3.3 — histone  H3  variants;  H2AX, 
H2AZ,  mH2A,  and  H2A — histone  H2  variants;  K — lysine;  me — methylation;  PCR1 — Polycomb 
group  (PcG)  repressive  complex  1 


demethylation  of  the  maternal  pronucleus,  facilitated  by  the  exclusion  of  DNA 
methyltransferase  1  (DNMT1)  from  the  nucleus  (Howell  et  al.  2001),  as  well  as  by 
TET-mediated  hydroxylation  (Inoue  and  Zhang  2011).  However,  imprinting  con- 
trol regions  (ICRs),  oocyte,  and  sperm-contributed  differentially  methylated 
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regions  (DMRs),  as  well  as  several  families  of  repeats,  such  as  class  II  intracisternal 
A-particles  (IAPs)  and  LlMd_A  elements,  retain  high  levels  of  DNA  methylation 
throughout  the  preimplantation  development  (Lane  et  al.  2003;  Smith  et  al.  2012). 
Minimum  levels  of  DNA  methylation  are  reached  at  the  blastocyst  stage,  followed 
by  postimplantation  gain  of  methylation  to  typical  somatic  levels  (Smith 
et  al.  2012). 

Genome-wide  reprogramming  of  histone  modifications  also  occurs  during  the 
preimplantation  development  (Fig.  2.1).  Immediately  after  fertilization,  the  pater- 
nal pronucleus  is  stripped  of  sperm-specific  proteins  called  protamines  and 
repackaged  with  maternally  stored  histone  variant  H3.3  that  is  usually  associated 
with  chromatin  regions  actively  transcribed  (Torres-Padilla  et  al.  2006).  Interest- 
ingly, deposition  of  H3.3  into  the  paternal  genome  by  the  histone  chaperone 
regulator  A  (HIRA)  is  an  important  event  for  the  establishment  of  pericentric 
heterochromatin,  which  is  required  for  proper  chromosome  segregation  during 
the  first  mitosis  (van  der  Heijden  et  al.  2005;  Santenard  et  al.  2010).  Only  a  few 
hours  later,  during  the  first  DNA  replication,  the  canonical  histone  H3  variants  are 
incorporated  for  the  first  time  into  the  paternal  genome  (Santenard  et  al.  2010). 
Histone  H3.3  within  the  male  pronucleus  becomes  trimethylated  at  lysine 
27  (H3K27me3)  and  this  repressive  histone  mark,  together  with  H3K9mel 
(monomethylation  of  lysine  9)  retained  in  pericentromeric  regions  and  residual 
DNA  methylation,  serves  as  a  substrate  for  pericentric  heterochromatin  formation 
mediated  by  the  Polycomb  group  (PcG)  repressive  complex  1  (PRC1)  (Puschendorf 
et  al.  2008).  In  the  female  pronucleus  histone  H3.3  transiently  disappears  and  is 
replaced  by  histone  H3.2  (Akiyama  et  al.  2011).  After  the  two-cell  stage,  H3.1  and 
H3.3  variants  re-localize  to  heterochromatin  and  euchromatin,  respectively 
(Akiyama  et  al.  2011).  The  pericentric  heterochromatin  in  the  female  pronucleus 
is  marked  with  the  repressive  histone  marks  H3K9me3,  H4K20me3,  and 
H3K64me3  and  binds  HPlp  (heterochromatin  1  beta)  protein  (Santos  et  al.  2005; 
Probst  et  al.  2007;  Daujat  et  al.  2009).  Histone  H2A  variants  are  also  reprogrammed 
during  the  preimplantation  development.  H2AZ  (important  for  gene  silencing), 
macroH2A  (associated  with  heterochromatic  regions  and  inactive  X  chromosome 
in  females),  and  the  canonical  H2A  are  not  incorporated  into  chromatin  during  the 
early  cleavage  stages,  and  are  possibly  even  actively  removed  after  fertilization.  In 
contrast,  H2AX  (implicated  in  DNA  repair)  is  particularly  enriched  in  early 
embryos  (Nashun  et  al.  2010).  Together,  all  the  reprogramming  events  described 
above  are  thought  to  contribute  to  the  efficient  acquisition  of  totipotency  during 
preimplantation  development. 

The  earliest  sign  of  cell  differentiation  occurs  at  the  blastocyst  stage  (embryonic 
day — E3.5  in  mouse  and  embryonic  day  5  in  human),  with  the  specification  of  the 
inner  cell  mass  (ICM)  and  the  trophectoderm  (TE).  This  event  coincides  with  the 
first  wave  of  de  novo  DNA  methylation.  As  result,  TE  is  relatively  hypomethylated 
compared  with  ICM,  as  revealed  by  the  5-methylcytosine  staining  (Santos 
et  al.  2002).  Similar  to  DNA  methylation,  several  histone  modifications,  including 
H3K27  methylation,  H3K9Ac  (histone  H3,  lysine  9  acetylation),  H4  acetylation, 
and  H3K9  methylation,  also  exhibit  asymmetry  between  ICM  and  TE,  either  at 
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global  level  or  at  specific  loci  (Erhardt  et  al.  2003;  Sarmento  et  al.  2004;  O'Neill 
et  al.  2006).  The  ICM  is  then  separated  during  the  late  blastocyst  stage  into  epiblast, 
that  will  form  the  future  embryo  proper,  and  primitive  endoderm  that  contributes, 
together  with  the  trophoblast  cells  derived  from  TE,  to  the  formation  of  the 
extraembryonic  tissues  (Reik  et  al.  2001).  Some  cells  derived  from  the  primitive 
endoderm  also  contribute  to  the  formation  of  the  definitive  embryonic  endoderm 
(Kwon  et  al.  2008).  After  the  formation  of  the  three  lineages  (epiblast,  primitive 
endoderm,  and  trophoblast),  the  cells  undergo  successive  steps  of  differentiation  to 
form  all  cell  types  of  the  organism,  including  placenta. 


2.3    Epigenetic  Regulation  of  Pluripotency 

The  three  lineages  at  the  blastocyst  stage  have  been  used  for  derivation  of  distinct 
stem  cell  types  that  can  be  maintained  in  vitro  (Fig.  2. 1):  trophoblast  stem  (TS)  cells 
from  TE  (Tanaka  et  al.  1998),  extraembryonic  endoderm  stem  (XEN)  cells  from  the 
primitive  endoderm  (Kunath  et  al.  2005),  and  embryonic  stem  (ES)  cells  from 
the  epiblast  (Matsui  et  al.  1992).  Analyses  performed  on  these  cell  lines  enabled  the 
identification  of  key  genetic  factors  that  regulate  pluripotency,  such  as  OCT4 
(octamer-binding  transcription  factor  4,  also  known  as  POU5F1 — POU  domain, 
class  5,  transcription  factor  1),  NANOG  (Nanog  homeobox),  SOX2  (SRY 
[sex-determining  region  Y]-box  2),  and  SALL4  (Sal-like  protein  4)  for  ES  cells 
(Mitsui  et  al.  2003;  Loh  et  al.  2006;  Wu  et  al.  2006);  CDX2  (caudal  type  homeobox 
2),  EOMES  (eomesodermin),  and  TEAD4  (TEA  domain  family  member  4)  for  TS 
cells  (Niwa  et  al.  2005;  Yagi  et  al.  2007);  and  GATA4  (GATA-binding  protein  4), 
GATA6  (GATA-binding  protein  6),  SOX7  (SRY  [sex-determining  region  Y]-box  7), 
and  SOX  17  (SRY  [sex-determining  region  Y]-box  17)  for  XEN  (Kunath 
et  al.  2005;  Lim  et  al.  2008).  The  study  of  these  stem  cell  lines  also  revealed  an 
intriguing  interplay  between  pluripotency  transcription  factors  and  epigenetic 
mechanisms.  In  fact,  it  is  now  thought  that  the  dynamic  balance  between  these  two 
regulatory  systems  may  form  the  basis  for  the  pluripotent  state. 


2.3.1    DNA  Methylation 

The  promoter  regions  of  many  pluripotency  genes  are  unmethylated  in  pluripotent 
stem  cell  lines  but  methylated  in  somatic  cells  (Fouse  et  al.  2008;  Meissner 
et  al.  2008;  Farthing  et  al.  2008;  Senner  et  al.  2012).  DNA  methylation  is  thought 
to  be  particularly  important  for  the  epigenetic  regulation  of  some  "gatekeeper" 
genes  that  reinforce  the  commitment  of  pluripotent  stem  cells  to  a  certain  lineage 
such  as  Elf 5,  (E74-like  factor  5  [ets  domain  transcription  factor])  which,  together 
with  Cdx2  and  Eomes,  safeguards  ES  cells  from  differentiating  into  trophoblast 
derivatives  (Ng  et  al.  2008).  Moreover,  DNA  methylation  is  the  only  epigenetic 
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mechanism  that  represses  the  activity  of  some  genes  implicated  in  differentiation  of 
ES  cells,  and  lack  of  DNA  methylation  in  mutant  ES  cells  leads  to  activation  of 
these  genes  (Fouse  et  al.  2008).  ES  cells  also  contain  substantial  levels  of  non-CpG 
methylation  (Ramsahoye  et  al.  2000;  Lister  et  al.  2009;  Ziller  et  al.  201 1).  Increased 
levels  of  non-CpG  methylation  have  been  found  in  exons  of  highly  expressed  genes, 
such  as  OCT4  (Lister  et  al.  2009).  Despite  these  characteristics,  whether  DNA 
methylation  is  absolutely  necessary  to  maintain  the  pluripotency  state  remains 
controversial.  Indeed,  ES  cells  lacking  completely  DNA  methylation  (triple  knock- 
out for  Dnmtl,  Dnmt3a,  and  Dnmt3b)  can  grow  robustly  and  maintain  to  a  large 
extent  their  undifferentiated  characteristics  (Tsumura  et  al.  2006). 

The  cytosines  in  DNA  can  acquire  alternative  modifications  besides 
5-methylcytosine.  Tetl  protein,  which  is  highly  expressed  in  ES  cells,  can  further 
modify  5-methylcytosine  into  5-hydroxymethylcytosine,  5-formylcytosine,  and 
5-carboxylcytosine  (Ito  et  al.  2011;  Wu  et  al.  2011;  He  et  al.  2011).  High  levels 
of  5-hydroxymethylcytosine  and  5-formylcytosine  in  ES  cells  are  associated  with 
actively  transcribed  genes,  as  well  as  with  Polycomb-repressed  developmental 
regulators,  and  were  demonstrated  to  guard  against  trans-differentiation  to  extra- 
embryonic lineages  (Ficz  et  al.  2011;  Wu  et  al.  2011;  Booth  et  al.  2012;  Raiber 
et  al.  2012).  However,  deletion  of  Tetl  gene  in  mice  is  compatible  with  embryonic 
and  postnatal  development,  possibly  due  to  partial  compensation  by  Tet2.  Accord- 
ingly, mutant  ES  cells  display  only  subtle  changes  in  gene  expression  and  skewed 
differentiation  towards  trophectoderm  in  vitro  (Dawlaty  et  al.  2011). 


2.3.2    Histone  Modifications 

In  addition  to  DNA  modifications,  histone  modifications  are  also  important  in 
controlling  gene  expression  during  cell  renewal  of  pluripotent  stem  cell  lines.  In 
agreement  with  the  previously  established  notion  that  H3K4me3  is  an  activating 
histone  modification  (Santos-Rosa  et  al.  2002),  peaks  of  this  mark  are  observed  in 
ES  cells  in  association  with  promoter  regions  of  key  pluripotency  genes  (Azuara 
et  al.  2006;  Barski  et  al.  2007).  However,  approximately  2,000  genes  that  are 
transcriptionally  repressed  in  ES  cells  but  are  required  for  later  differentiation 
(such  as  Sox,  Hox,  Fox,  Pax,  and  Irx  gene  families)  are  concomitantly  decorated 
at  their  promoter  regions  with  both  active  H3K4me3  and  repressive  H3K27me3, 
pattern  dubbed  as  "bivalent"  (Azuara  et  al.  2006;  Bernstein  et  al.  2006;  Pan 
et  al.  2007;  Zhao  et  al.  2007).  The  bivalent  domains  are  often  found  at  promoters 
containing  CpG  islands  and  many  bind  OCT4,  NANOG,  or  SOX2  (Bernstein 
et  al.  2006;  Mikkelsen  et  al.  2007).  Virtually  all  bivalent  domains  bind  PcG  proteins 
belonging  to  the  PRC2  complex  (embryonic  ectoderm  development — EED, 
AE-binding  protein  2 — AEBP2,  SUZ12 — suppressor  of  zeste  12  homolog  [Dro- 
sophila],  and  the  H3K27  methyltransferase  EZH2 — enhancer  of  zeste  homolog 
2  [Drosophila])  (Ku  et  al.  2008).  Recently,  jumonji,  AT-rich  interactive  domain 
2  (JARID2 — a  member  of  the  Jumonji  family  of  lysine  demethylases)  and  the 
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Polycomb-like  2  (PCL2)  protein  were  also  found  to  associate  with  PRC2  in  mouse 
ES  cells  and  are  thought  to  play  important  roles  in  pluripotency  (Peng  et  al.  2009; 
Pasini  et  al.  2010;  Walker  et  al.  2010).  A  subset  of  the  bivalent  domains  also  binds 
PcG  proteins  of  the  PRC1  complex  (Ku  et  al.  2008).  The  PRC1  protein  RNF2  (ring 
finger  protein  2,  also  known  as  RING  IB)  is  responsible  for  ubiquitination  of  histone 
H2A  at  lysine  119,  which  in  turn  is  responsible  for  RNA  polymerase  II  stalling 
(a  mechanism  for  transcriptional  silencing)  at  promoters  of  bivalent  genes  (Stock 
et  al.  2007).  The  H3K4me3  mark  at  the  bivalent  domains  is  generated  by  the  H3K4 
methyltransferase  activity  of  the  MLL/trithorax  complex  (myeloid/lymphoid  or 
mixed-lineage  leukemia  [trithorax  homolog,  Drosophila])  (Dou  et  al.  2006).  Simi- 
lar bivalent  chromatin  profiles  have  also  been  identified  in  TS  cells  at  promoter 
regions  of  silenced,  lineage-specific  regulatory  genes  (Santos  et  al.  2010).  How- 
ever, in  XEN  cells  lineage- specific  genes  are  marked  solely  by  repressive  histone 
modifications,  pattern  thought  to  reflect  the  restricted  developmental  potential  of 
these  cells  (Santos  et  al.  2010). 

The  repressive  histone  modifications  H3K9me2  and  H3K9me3  colonize  differ- 
ent regions  of  the  ES  cells'  genome.  H3K9me2  is  found  in  large  blocks  in  the 
genome  (several  megabases  each),  which  are  highly  conserved  between  mouse  and 
human  (Wen  et  al.  2009).  H3K9me3,  found  in  ES  cells  mostly  in  partnership  with 
H3K20me3,  is  required  for  silencing  several  classes  of  endogenous  retroviruses 
(Mikkelsen  et  al.  2007;  Rowe  et  al.  2010;  Matsui  et  al.  2010;  Macfarlan  et  al.  201 1). 

In  addition  to  promoters  and  repetitive  DNA,  histone  modifications  are  particu- 
larly important  in  regulating  the  activity  of  enhancer  elements  (short  regions  of 
DNA  often  found  distant  from  transcription  start  sites  that  bound  transcription 
factors  and  enhance  gene  transcription).  Based  on  the  patterns  of  histone 
modifications,  two  distinct  classes  of  enhancers  can  be  identified  in  ES  cells. 
Both  classes  are  characterized  by  open  chromatin,  marked  by  the  presence  of 
DNase  I  hypersensitive  sites  (DHSs),  enrichment  in  highly  mobile  nucleosomes 
containing  the  specialized  histone  variants  H3.3  and  H2A.Z,  binding  of  the  histone 
acetyltransferase  P300,  and  monomethylation  of  histone  H3  at  lysine 
4  (H3K4mel).  Active  enhancers,  often  located  in  the  vicinity  of  active  genes 
such  as  the  key  pluripotency  genes,  are  characterized  by  acetylation  of  histone 
H3  at  lysine  27  (H3K27ac).  In  contrast,  the  so-called  poised  enhancers,  located  near 
genes  involved  in  controlling  early  steps  of  differentiation  and  marked  with  biva- 
lent domains  at  their  promoters,  are  depleted  in  H3K27ac  and  instead  are  enriched 
in  H3K27me3  and  H3K9me3  (Creyghton  et  al.  2010;  Rada-Iglesias  et  al.  2011; 
Zentner  et  al.  2011;  Buecker  and  Wysocka  2012). 


2.3.3    Chromatin-Modifying  Complexes 

With  the  exception  of  acetylation,  most  histone  modifications  do  not  impose 
directly  changes  in  the  chromatin  conformation.  Instead,  they  often  bind 
chromatin-remodeling  factors,  which  utilize  the  energy  released  from  ATP 


42 


I.  Sandovici 


hydrolysis  to  exchange  histones  and  reposition  or  evict  nucleosomes.  When  com- 
pared with  differentiated  cells,  pluripotent  stem  cells  are  characterized  by  a  gener- 
ally open  chromatin  state  (Gaspar-Maia  et  al.  2011).  There  are  four  families  of 
chromatin  remodelers  (SWI/SNF,  CHD/NURD,  ISWI,  and  INO80)  and  many 
subunits  of  these  families  have  been  shown  to  play  important  roles  in  pluripotent 
stem  cells  (reviewed  by  Gaspar-Maia  et  al.  2011).  For  example,  the  SWI/SNF 
family  member  BGR1  (also  known  as  SMARCA4)  opposes  PcG  proteins  by 
opening  the  chromatin  at  LIF/STAT3  (leukemia  inhibitory  factor/signal  transducer 
and  activator  of  transcription  3)  target  sites  in  ES  cells.  However,  BRG1  also 
facilitates  PcG  function  at  classical  PcG  targets,  including  all  four  Hox  loci, 
reinforcing  their  repression  in  ES  cells  (Ho  et  al.  2011).  The  chromodomain 
helicase  DNA-binding  protein  1  (CHD1)  member  of  the  CHD  family  binds  globally 
to  active  euchromatin  and  co-localizes  with  RNA  polymerase  II  (RNAPII)  in  ES 
cells  and  CHD1  depletion  by  RNA  interference  leads  to  accumulation  of  high  levels 
of  heterochromatin  (Gaspar-Maia  et  al.  2009).  CHD3  and  CHD4  constitute  the 
catalytic  subunit  of  the  nucleosome-remodeling  (NuRD)  complex,  which  also 
contains  histone  deacetylases  (HDAC1  and  HDAC2)  and  a  methyl-binding  protein 
(MBD3).  MBD3  cooperates  with  BRG1  to  maintain  the  global  levels  of 
5-hydroxymethylcytosine  in  ES  cells  (Yildirim  et  al.  2011).  Finally,  the  TIP60/ 
KAT5-P400  (lysine  acetyl  transferase  5/E1  A-binding  protein  p400)  complex  of  the 
INO80  family  facilitates  transcription  by  combining  nucleosome  remodeling  with 
histone  acetylase  activity.  ES  cells  depleted  in  different  subunits  of  the  TIP60-P400 
complex  exhibit  decreased  proliferation  rates,  reduced  pluripotency,  and  reduced 
viability  (Fazzio  et  al.  2008).  The  TIP60-P400  complex  also  binds  H3K4me3  at 
bivalent  domains,  an  interaction  that  is  facilitated  by  Nanog  (Fazzio  et  al.  2008). 


2.3.4   Mechanisms  for  Targeting  Epigenetic 
Patterns  in  Pluripotent  Stem  Cells 

The  patterns  of  chromatin  and  DNA  modifications  in  pluripotent  stem  cells  cannot 
be  explained  only  by  the  genomic  distribution  of  transcription  factor-binding  sites. 
Indeed,  targeting  of  PcG  proteins  at  the  bivalent  promoters  in  ES  cells  is  only 
partially  explained  by  the  concomitant  binding  of  the  core  pluripotency  transcrip- 
tion factors  (Bernstein  et  al.  2006;  Mikkelsen  et  al.  2007).  Therefore,  other  factors 
such  as  the  local  DNA  sequence,  noncoding  RNAs  (ncRNAs),  and  the  higher  order 
chromatin  structure  may  be  important  in  this  process. 

The  direct  involvement  of  DNA  sequence  was  first  demonstrated  in  the  case  of 
JARID2,  protein  that  binds  directly  to  DNA  and  plays  a  major  role  in  targeting 
PRC2  complexes  to  the  correct  sites  (Peng  et  al.  2009;  Pasini  et  al.  2010).  Addi- 
tionally, a  recent  study  has  identified  a  PcG  responsive  element  in  human  ES  cells,  a 
highly  conserved  1.8  kb  DNA  sequence  located  between  HOXD11  (homeobox 
Dll)  and  HOXD12  (homeobox  Dll)  genes,  which  is  nucleosome  depleted  and 
GC-rich  and  contains  YY1  transcription  factor-binding  sites  (Woo  et  al.  2010). 
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It  has  also  been  demonstrated  that  short  DNA  sequences  inserted  into  the  mouse 
ES  cells  can  autonomously  induce  hypo-  or  de  novo  methylation  in  cis  (Lienert 
et  al.  2011). 

ncRNAs  may  be  another  important  class  of  regulators  for  establishing  epige- 
netic patterns  in  ES  cells.  Short  ncRNAs  (<200  nt)  interact  with  PRC2  and 
are  involved  in  stabilizing  PRC2  association  with  chromatin,  though  the  impor- 
tance of  direct  base  pairing  at  specific  sequence  motifs  is  still  unknown  (Kanhere 
et  al.  2010).  Furthermore,  over  3,000  large  intergenic  noncoding  RNAs 
(lincRNAs)  have  been  recently  identified  in  mouse  ES  cells.  At  least  a  third  of 
them  are  associated  with  chromatin  complexes  involved  in  reading,  writing,  or 
erasing  histone  modifications  and  are  critical  for  pluripotency  maintenance 
(Guttman  et  al.  2009,  2011). 

Finally,  the  higher  order  chromatin  structure  may  also  contribute  to  the  appro- 
priate establishment  of  epigenetic  marks  in  pluripotent  stem  cells  (reviewed  by 
Li  et  al.  2012).  The  insulator  protein  CTCF  (CCCTC-binding  factor  [zinc  finger 
protein]),  known  to  mediate  long-range  interactions  between  distant  regulatory 
elements,  has  been  shown  to  cooperate  with  Oct4  in  organizing  the  chromatin 
loops  at  the  Nanog  locus  (Levasseur  et  al.  2008).  A  recent  genomic  analysis  in 
ES  cells  has  revealed  that  binding  sites  for  cohesin  (a  key  partner  for  CTCF), 
mediator,  and  the  cohesin-loading  factor  NIPBL  (nipped-B  homolog  [Drosophila]) 
overlap  at  active  promoters  and  enhancers  (Kagey  et  al.  2010).  Together,  these 
complexes  contribute  to  chromatin  looping  between  enhancers  and  promoters  in 
patterns  that  are  specific  to  ES  cells  (Kagey  et  al.  2010). 


2.4    Establishment  of  Epigenetic  Patterns 
During  Differentiation 

Gene  deletion  studies  in  mice  or  knockdown  experiments  in  ES  cells  demonstrated 
for  the  first  time  that  many  epigenetic  modifiers  play  critical  roles  during  differenti- 
ation. Accordingly,  deletion  of  Dnmtl  results  in  lethality  before  El 0.5 
(Li  et  al.  1992)  and  disruption  of  Dnmt3b  is  lethal  before  E9.5  (Okano  et  al.  1999). 
Additionally,  ES  cells  deficient  for  all  three  DNA  methyltransferases  show 
increased  cell  death  upon  differentiation  into  epiblast  lineages,  but  not  during 
differentiation  into  extraembryonic  lineages  and  do  not  contribute  to  embryonic 
lineages  when  injected  into  blastocysts  (Sakaue  et  al.  2010).  Deletion  of  the  histone 
methyltransferase  G9a/EHMT2  (euchromatic  histone-lysine  N-methyltransferase  2) 
results  in  embryonic  lethality  between  E8.5  and  E9.5  (Tachibana  et  al.  2002). 
Knockdown  of  various  PcG  proteins  in  ES  cells  affects  their  ability  to  differentiate 
(Azuara  et  al.  2006;  Bernstein  et  al.  2006;  Pasini  et  al.  2007,  2010).  Depletion  of 
the  MLL  complex  component  DPY30  (dpy-30  homolog  [C.  elegans])  in  ES  cells, 
which  decreases  H3K4me3  at  bivalent  domains,  results  in  a  significant  reduction  in 
the  differentiation  potential,  particularly  along  the  neural  lineage  (Jiang  et  al.  201 1). 
Deletions  of  Mbd3  or  Hdacl  result  in  aberrant  differentiation  of  mouse  ES  cells 
(Kaji  et  al.  2006;  Dovey  et  al.  2010). 
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In  addition  to  these  loss-of -function  studies,  another  major  approach  for  improving 
our  understanding  of  the  transitions  that  occur  during  differentiation  was  the  use 
of  various  "omics"  analyses.  Most  studies  that  addressed  the  role  of  epigenetic 
modifications  during  differentiation  have  compared  the  genomic  distribution  of 
various  marks  between  ES  cells  and  cells  differentiated  in  vitro  or  with  donor-derived 
somatic  cells.  These  studies  have  identified  several  general  mechanisms  implicated 
in  the  establishment  of  tissue-specific  epigenetic  patterns  during  differentiation. 

2.4.1  Dynamics  ofDNA  Methylation  During  Differentiation 

Whereas  a  small  number  of  genes  undergo  DNA  demethylation  upon  commitment 
to  a  cell  lineage,  many  more  gain  CpG  methylation  (Fouse  et  al.  2008;  Meissner 
et  al.  2008).  Loss  of  DNA  methylation  (Fig.  2.2a)  is  observed  especially  at  lineage- 
specific  gene -regulatory  elements.  De  novo  DNA  methylation  (Fig.  2.2b)  is  respon- 
sible for  the  active  repression  of  core  pluripotency  and  germline- specific  genes,  as 
well  as  for  some  lineage  choice  events.  Repression  of  pluripotency  genes  is  initiated 
by  local  binding  of  the  G9a  histone  methyltransferase,  which  through  its  SET 
domain  brings  about  local  methylation  of  histone  H3K9me3  (Feldman 
et  al.  2006;  Epsztejn-Litman  et  al.  2008).  Subsequently,  H3K9me3  binds 
HP1/CBX5  (chromobox  homolog  5),  thus  generating  a  local  heterochromatic 
structure.  In  parallel,  G9a/EHMT2  recruits  the  DNA  methyltransferases  Dnmt3a 
and  3b,  which  then  induce  de  novo  methylation  of  these  genes  (Feldman  et  al.  2006; 
Epsztejn-Litman  et  al.  2008).  De  novo  DNA  methylation  during  ES  cell  differenti- 
ation also  occurs  at  CpG  island  promoters  and  at  sequences  outside  of  promoter 
regions,  many  of  which  act  as  enhancers  (Mohn  et  al.  2008;  Meissner  et  al.  2008; 
Stadler  et  al.  2011).  Importantly,  a  recent  study  performed  on  early  embryos 
confirmed  that  Dnmt3b  catalyzes  the  gain  of  DNA  methylation  in  E6.5  epiblast 
cells  (Borgel  et  al.  2010).  Similarly  with  the  data  obtained  in  ES  cells  cultured 
in  vitro,  these  epigenetic  events  target  promoters  of  pluripotency  and  germline- 
specific  genes,  as  well  as  genes  programmed  to  be  expressed  later  during  develop- 
ment. For  the  latter  category  of  genes,  promoter  methylation  acquired  in  epiblast  is 
then  erased  during  terminal  cell  differentiation  (Borgel  et  al.  2010). 

2.4.2  Resolution  of  Bivalent  Domains 

In  the  case  of  bivalent  domains  (Fig.  2.2c)  it  is  thought  that  the  concomitant 
presence  of  active  and  repressive  modifications  in  pluripotent  stem  cells  allows 
rapid  resolution  of  these  domains  into  single  H3K27me3  or  H3K4me3  marks 
during  differentiation  (Mikkelsen  et  al.  2007).  Removal  of  H3K27me3  is  achieved 
by  two  H3K27  demethylases:  UTX/KDM6A  (lysine  [K] -specific  demethylase  6 A) 
and  JMJD3/KDM6B  (lysine  [K]-specific  demethylase  6B)  (Agger  et  al.  2007;  Lee 
et  al.  2007;  Lan  et  al.  2007).  For  example,  human  ES  cells  induced  to  differentiate 
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Fig.  2.2  Epigenetic  changes  during  differentiation  (see  text  for  details),  (a)  Loss  of  DNA 
methylation  at  lineage-specific  regulatory  genes  (together  with  changes  in  histone  marks) 
leads  to  gene  activation,  (b)  De  novo  DNA  methylation  at  promoters  of  pluripotency  genes  or 
germline- specific  genes,  (c)  Epigenetic  reprogramming  of  bivalent  promoters  allows  gene 
silencing  or  activation  in  a  tissue-specific  manner,  (d)  Transition  between  the  "poised"  and 
the  "active"  state  at  enhancer  elements  during  differentiation,  (e)  The  role  of  "pioneer"  tran- 
scription factors  (such  as  FoxAl — forkhead  box  Al)  in  generating  de  novo  enhancers  during 
differentiation.  K — lysine;  me — methylation;  ac — acetylation;  P300 — histone  acetyltransf erase; 
PRC2 — Polycomb  group  (PcG)  repressive  complex  2;  Pol  II — RNA  polymerase  II;  eRNA — 
enhancer-associated  RNAs 

by  treatment  with  retinoic  acid  recruit  KDM6A  at  the  promoters  of  the  anterior 
genes  of  HOXA  and  HOXB  loci.  Recruitment  of  KDM6A  to  these  promoters 
coincides  with  disappearance  of  H3K27me3,  decreased  occupancy  of  the  PRC2 
complex  components  SUZ12  and  EZH2,  and  gene  activation,  while  knockdown  of 
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KDM6A  prevents  these  events  (Agger  et  al.  2007;  Lee  et  al.  2007).  KDM6A  has 
also  been  demonstrated  to  activate  muscle-specific  genes  during  myogenesis,  being 
targeted  to  the  correct  promoters  by  the  transcriptional  activator  Six4  (SIX  homeo- 
box  4)  (Seenundun  et  al.  2010).  Interestingly,  KDM6A  associates  with  two 
H3K4  methyltransferases,  MLL3/KMT2C  (myeloid/lymphoid  or  mixed-lineage 
leukemia  3)  and  MLL4/KMT2D  (myeloid/lymphoid  or  mixed-lineage  leukemia  4), 
suggesting  cooperation  between  H3K4  methylation  and  H3K27  demethylation 
(Lee  et  al.  2007;  Issaeva  et  al.  2007).  JMJD3/KDM6B  has  been  demonstrated  to 
resolve  the  bivalent  domain  at  the  Nes  (nestin)  gene  promoter  and  to  control  the 
expression  of  key  regulators  and  markers  of  neurogenesis  during  the  commitment  of 
ES  cells  towards  the  neural  lineage  (Burgold  et  al.  2008).  Removal  of  H3K4me3  from 
the  bivalent  domains  is  achieved  by  the  KDM5  demethylases.  KDM5A  (JARID1A/ 
RBP2)  is  recruited  at  the  bivalent  domains  by  the  PRC2  complex  (Pasini  et  al.  2008). 
Additionally,  KDM5B  (JARID1B/PLU1)  binds  to  a  substantial  fraction  of  bivalent 
domains  in  ES  cells  and  is  required  for  silencing  stem  cell  and  germ  cell-specific  genes 
during  ES  cell  differentiation  into  neural  progenitor  cells  (Schmitz  et  al.  201 1). 

It  is  important  to  stress  that  bivalent  domains  play  important  roles  throughout 
differentiation.  Indeed,  when  ES  cells  are  differentiated  into  neural  cells,  the 
resolution  of  some  bivalent  domains  is  counterbalanced  by  appearance  of  new 
ones  at  other  promoter  regions.  Moreover,  ~41  %  of  the  bivalent  domains  found 
in  ES  cells  are  preserved  after  differentiation  into  terminal  pyramidal  neurons 
(Mohn  et  al.  2008).  In  hemangioblasts,  which  are  hematopoietic/endothelial 
precursors,  some  neuronal  genes  retain  bivalency  and  require  the  presence  of  the 
PRC1  component  RING1B/RNF2  to  remain  silent  (Mazzarella  et  al.  2011).  Adult 
stem  cells,  which  maintain  the  natural  homeostasis  of  adult  tissues  by  supplying  a 
continuous  pool  of  differentiated  cells  in  response  to  external  signals,  also  contain 
bivalent  chromatin  domains  (Mikkelsen  et  al.  2007;  Cui  et  al.  2009). 


2.4.3    Chromatin  Changes  at  Enhancer  Elements 

Epigenetic  reprogramming  at  enhancer  elements  is  perhaps  one  of  the  most 
important  events  in  the  establishment  of  tissue-specific  gene  expression  patterns 
during  differentiation.  Indeed,  during  differentiation  of  human  ES  cells  into  a 
mesendodermal  lineage,  chromatin  modifications  at  promoters  remained  largely 
invariant,  with  much  greater  dynamics  in  chromatin  modifications  at  enhancers, 
especially  for  H3K4mel  and  H3K27ac  (Hawkins  et  al.  2011).  The  main  event  that 
takes  place  during  differentiation  at  enhancers  is  a  switch  from  the  poised  to  the 
active  status  (Fig.  2. 2d),  which  coincides  with  the  ability  to  drive  gene  expression. 
Interestingly,  recent  evidence  suggests  that  the  pluripotency  factors  active  in  ES 
cells  are  not  only  involved  in  maintaining  the  pluripotent  state  of  these  cells  but  also 
have  a  direct  role  during  differentiation.  Accordingly,  SOX2,  which  binds  many 
poised  enhancers  in  ES  cells,  is  replaced  by  SOX3  in  neural  progenitor  cells  and 
then  by  SOX11  in  differentiated  neurons.  Upon  binding  of  activating  SOX3  or 
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SOX1 1  transcription  factors,  the  poised  chromatin  state  is  resolved  into  an  active 
one  (Bergsland  et  al.  2011).  Other  poised  enhancers  are  co-occupied  by  several 
pluripotency  factors  in  ES  cells  and  this  multiple  binding  is  thought  to  prevent  their 
premature  activation.  One  example  is  the  EOMES  enhancer,  which  is  bound  by 
OCT4,  SOX2,  and  NANOG  in  human  ES  cells  (Teo  et  al.  2011).  At  the  onset  of 
endoderm  specification,  SOX2  departure  and  the  persistence  of  NANOG  binding 
lead  to  activation  of  this  enhancer  and  increased  expression  of  eomesodermin 
(Teo  et  al.  2011). 

Not  all  enhancers  that  are  used  during  differentiation  are  established  in  pluripo- 
tent  cells  as  active  or  poised  ones.  Many  enhancers  are  generated  de  novo  during 
differentiation  by  the  intervention  of  the  so-called  pioneer  factors,  which  are  often 
lineage- specific  transcription  factors  that  have  the  ability  to  bind  DNA  sequences  at 
chromatin-compacted  regions  (Fig.  2.2e).  Pioneer  factors  such  as  FOX  A  (forkhead 
box  A)  and  GATA  (GATA-binding  protein)  recruit  subsequently  chromatin 
remodelers,  which  establish  the  characteristic  open  chromatin  structure  of  active 
enhancers  (Zaret  and  Carroll  2011). 


2.5    Cell  Type-Specific  Epigenetic  Patterns 

in  Differentiated  Mammalian  Cells:  Lessons 
Learned  from  the  ENCODE  Project 

A  first  indication  on  the  complexity  of  cell  type- specific  epigenetic  patterns  came 
from  the  analysis  of  just  1  %  of  the  human  genome  during  the  pilot  phase  of  the 
ENCODE  project  (ENCODE  Project  Consortium  2007).  This  preliminary  analysis 
was  then  extended  to  the  entire  genome  during  the  production  stage  of  the 
ENCODE  project.  This  stage  of  the  project  ended  up  with  the  release  of  a  much 
more  extensive  set  of  results  (Fig.  2.3)  obtained  in  147  different  cell  types 
(including  both  immortalized  cell  lines  and  primary  cell  types  from  a  variety  of 
tissues  and  developmental  stages).  The  most  important  conclusion  of  the  project 
was  that  approximately  80  %  of  the  human  genome  participates  in  at  least  one 
biochemical  function,  most  of  which  are  related  to  gene  regulation  (ENCODE 
Project  Consortium  2012).  Some  of  the  most  relevant  findings  of  the  project  are 
summarized  below. 


2.5.1    Transcriptional  Landscape 

RNA  sequencing  was  performed  in  the  set  of  15  cell  types  most  commonly 
used  across  the  consortium  and  showed  that,  although  the  20,687  protein-coding 
genes  cover  less  than  3  %  of  the  genome,  cumulatively,  nearly  75  %  of  the  human 
genome  is  transcribed  (Djebali  et  al.  2012).  Each  protein-coding  gene  associates  on 
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Fig.  2.3  Example  of  epigenome  maps  obtained  through  the  ENCODE  project.  This  window  is 
centered  around  the  hepatocyte  nuclear  factor  4,  alpha  (HNF4A)  locus  on  human  chromosome 
20,  as  visualized  in  the  UCSC  Genome  Browser  (http://genome.ucsc.edu).  The  top  part  of  the 
figure  indicates  the  alternative  transcripts  identified  at  this  locus.  Follow  a  series  of  maps 
indicating  the  distribution  of  various  histone  marks,  transcription  factor-binding  sites,  and  regions 
of  open  chromatin  in  several  cell  types,  with  peaks  of  enrichments  indicated  by  vertical  bars 


2    Establishment  of  Tissue-Specific  Epigenetic  States  During  Development 


49 


average  6.3  alternatively  transcripts  and,  although  many  isoforms  are  expressed 
simultaneously  in  a  single  cell  type,  one  of  these  dominates  (Djebali  et  al.  2012). 
The  majority  of  protein-coding  genes  (53  %)  are  constitutively  expressed  (in  all  cell 
lines)  and  only  a  small  fraction  (7  %)  are  cell  line  specific  (Djebali  et  al.  2012). 
ENCODE  also  identified  9,277  manually  curated  long  noncoding  RNA  (IncRNA) 
loci  generating  -15,000  transcripts,  most  of  which  are  associated  with  chromatin 
and  display  more  tissue-specific  expression  patterns  than  the  protein-coding 
genes  (Derrien  et  al.  2012).  Approximately  18  %  of  the  protein-coding  and  IncRNA 
genes  exhibit  allele- specific  expression  (Djebali  et  al.  2012).  ENCODE  also 
identified  11,216  pseudogenes,  of  which  876  are  transcribed  (Pei  et  al.  2012). 
Only  a  small  fraction  of  the  transcribed  pseudogenes  are  active  in  all  tissues 
analyzed,  while  most  are  transcribed  only  in  one  tissue  (Pei  et  al.  2012).  There 
are  also  7,053  annotated  small  RNAs,  which  include  1,944  small  nuclear  (sn)RNAs, 
1,521  small  nucleolar  (sno)RNAs,  1,756  [i  (mi)RNAs,  and  624  transfer  (t)RNAs 
(Djebali  et  al.  2012).  Other  categories  of  transcripts  include  unannotated  short 
RNAs  such  as  the  promoter-associated  short  RNAs  (PASRs)  and  the  termini- 
associated  short  RNAs  (TASRs),  transcripts  emanating  from  repeat  elements  and 
enhancer-associated  RNAs  (eRNAs)  (Djebali  et  al.  2012). 


2.5.2    DNA  Methylation  Landscape 

DNA  methylation  was  analyzed  in  82  cell  types  (cell  lines  and  primary  cells)  using 
the  reduced  representation  bisulfite  sequencing  (RRBS),  which  can  interrogate  1.2 
million  CpGs  located  in  intergenic  regions,  proximal  promoters,  and  gene  bodies 
(8.6  %  of  non-repetitive  genomic  CpGs),  with  a  preferential  bias  towards  CpG 
islands  (Meissner  et  al.  2008).  Ninety-six  percentage  of  all  analyzed  CpGs  were 
found  to  exhibit  differential  methylation  in  at  least  one  cell  type,  with  the  highest 
variability  found  at  gene  bodies  and  intergenic  regions,  rather  than  at  promoters 
(ENCODE  Project  Consortium  2012).  In  addition,  unmethylated  intragenic  CpG 
islands  were  found  to  associate  binding  of  P300  histone  acetyltransf erase,  a  known 
marker  for  enhancer  activity  (Creyghton  et  al.  2010;  ENCODE  Project  Consortium 
2012).  Differential  DNA  methylation  associates  with  tissue-specific  binding  of 
CTCF,  a  ubiquitously  expressed  regulator  of  transcription  and  chromatin  structure. 
Comparison  between  DNA  methylation  distribution  and  CTCF-binding  sites  for  a 
subset  of  -4,000  CTCF  peaks  indicates  that  over  40  %  of  the  cell  type- specific 
CTCF  binding  is  associated  with  local  differential  DNA  methylation  (Wang 
et  al.  2012).  Additionally,  20  %  of  the  DHSs  with  cell  type-specific  accessibility 
show  a  significant  negative  correlation  with  levels  of  DNA  methylation,  while  the 
remaining  80  %  of  DHSs  are  constitutively  hypomethylated  (Thurman  et  al.  2012; 
Neph  et  al.  2012).  Moreover,  for  70  %  of  transcription  factors,  average  methylation 
at  cognate  binding  sites  is  significantly  and  negatively  correlated  with  transcript 
levels  of  the  corresponding  transcription  factors  (Thurman  et  al.  2012). 


50 


I.  Sandovici 


2.5.3  Histone  Modification  Landscape 

The  ENCODE  project  analyzed  systematically  11  histone  modifications  and 
1  histone  variant  (H2A.Z)  in  46  cell  types.  The  main  conclusion  of  this  analysis  is 
that  histone  modification  patterns  can  be  reliably  used  to  assign  functional  attributes 
to  genomic  regions  (ENCODE  Project  Consortium  2012;  Dong  et  al.  2012). 
For  example,  transcriptionally  active  GC-rich  (and  TATA-less)  promoters  are 
associated  with  H2A.Z,  H3K9ac,  H3K27ac,  H3K4me3,  and  H3K4me2,  while 
repressed  promoters  are  associated  with  H3K27me3  or  H3K9me3.  H3K79me2  and 
H3K36me3  are  marks  of  transcription  elongation;  however,  H3K79me2  occurs 
preferentially  at  the  5'  end  of  the  gene  bodies,  while  H3K36me3  is  enriched  at  3r 
of  the  first  intron  (ENCODE  Project  Consortium  2012).  These  two  last  marks  can  also 
be  used  to  predict  patterns  of  alternative  splicing:  H3K36me3  has  a  positive  contri- 
bution to  exon  inclusion,  while  H3K79me2  has  a  negative  contribution  (ENCODE 
Project  Consortium  2012).  By  overlapping  histone  patterns  with  DHS  maps  44,853 
novel  putative  promoters  were  identified,  many  of  which  are  active  in  a  cell- specific 
manner,  are  contained  within  the  gene  bodies  of  previously  annotated  genes,  and 
show  antisense  orientation  (Thurman  et  al.  2012).  Patterns  of  histone  modifications  at 
enhancer  regions  are  amongst  the  best  associated  with  cell-specific  gene  activity. 
Active  enhancers  are  characterized  by  the  presence  of  DHSs  that  bind  RNA  poly- 
merase II  and  are  enriched  in  H3K4mel,  H3K27ac,  H3K9ac,  and  H3K79me2  and 
depleted  in  H3K27me3  (Thurman  et  al.  2012;  Djebali  et  al.  2012). 

2.5.4  Open  Chromatin  Landscape 

Regions  of  open  chromatin  identified  by  DNase  I  hypersensitivity  are  often  found  at 
regulatory  DNA  regions.  Using  DNase-seq  in  125  cell  types  ~2.9  million  DHSs 
were  identified,  most  of  them  being  located  distal  to  transcription  start  sites  (TSSs) 
and  highly  cell  specific.  A  complementary  technique — FAIRE-seq  (formaldehyde- 
assisted  isolation  of  regulatory  elements)  performed  in  25  cell  types — also 
identified  ~4.8  million  sites  depleted  in  nucleosomes,  many  of  which  overlap 
with  DHSs  (ENCODE  Project  Consortium  2012;  Thurman  et  al.  2012). 
Overlapping  DHSs  with  high-throughput  ChlP-seq  data  for  42  transcription  factors 
in  the  K562  cell  line  (immortalized  myeloid  leukemia  cells)  showed  that  over  94  % 
of  the  transcription  factor-binding  sites  fall  within  accessible  chromatin.  Notable 
exceptions  are  transcription  factors  known  to  bind  to  compacted  heterochromatin, 
such  as  TRIM28  (tripartite  motif  containing  28),  SETDB1  (SET  domain,  bifurcated 
1),  and  ZNF274  (zinc  finger  protein  274)  (Thurman  et  al.  2012).  Moreover,  a 
correlation  between  distal  DHSs  and  DHSs  located  at  known  promoters  across 
79  cell  types  allowed  functional  connection  of  ~5 80,000  distal  enhancers  with  their 
target  promoters.  Most  promoters  are  connected  with  more  than  one  distal  DHS  and 
vice  versa,  indicating  a  very  complex  ds-regulatory  circuit  of  the  human  genome. 
In  addition  to  this  synchronized  activation  between  promoters  and  distal  enhancers, 
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hundreds  of  enhancers  around  the  genome  showed  patterns  of  matched 
co-activation,  suggesting  highly  choreographed  cell  type-specific  behavior  and 
common  functions  (Thurman  et  al.  2012).  Finally,  micrococcal  nuclease  (MNase) 
digestion  followed  by  high-depth  sequencing  was  used  to  map  nucleosome  occu- 
pancy in  two  cell  types:  GM12878  (lymphoblastoid  cell  line)  and  K562  (Kundaje 
et  al.  2012).  This  analysis,  combined  with  12  histone  marks,  DNase-seq,  and 
binding  sites  for  119  DNA-binding  proteins,  demonstrated  that,  with  the  exception 
of  CTCF/cohesion  complex,  nucleosomes  as  well  as  histone  marks  are  deposited 
asymmetrically  around  promoters,  enhancers,  or  transcription  factor-binding  sites 
(Kundaje  et  al.  2012). 

2.5.5    Long-Range  Interaction  Landscape 

Physical  interactions  between  distant  chromosomal  regions,  which  are  thought  to  be 
important  for  regulation  of  gene  expression,  were  assessed  using  two  complemen- 
tary technologies:  5C  (chromosome  conformation  capture  carbon  copy)  and  ChlA- 
PET  (chromatin  interaction  analysis  with  paired-end  tag  sequencing)  (ENCODE 
Project  Consortium  2012;  Sanyal  et  al.  2012).  The  5C  approach  was  used  for  an 
unbiased  interrogation  of  all  interactions  between  TSSs  previously  identified  by  the 
pilot  ENCODE  project  and  distal  genomic  regions.  This  assay  performed  in  four 
cell  types  identified  over  1,000  long-range  interactions  (Sanyal  et  al.  2012).  The 
most  frequent  interactions  of  the  assessed  TSSs  were  with  enhancers,  other 
promoters,  and  CTCF-binding  sites,  and  each  of  these  elements  was  found  to  be 
engaged  in  multiple  interactions.  The  TSS-enhancer  and  TSS-promoter 
interactions  were  often  found  to  be  cell  type  specific,  while  the  interactions  of 
TSS-CTCF  were  most  of  the  time  common  to  all  four  cell  types  (Sanyal 
et  al.  2012).  The  ChlA-PET  approach,  which  interrogates  interactions  between 
chromatin  regions  that  bind  RNA  polymerase  II,  has  been  applied  within  the 
ENCODE  project  for  the  K562  cell  line.  This  analysis  identified  over  120,000 
promoter-centered  interactions,  the  vast  majority  of  which  were  intrachromosomal. 
Similar  to  the  5C  approach,  this  analysis  showed  that  most  promoters  are  engaged 
in  multiple  promoter-enhancer  and  promoter-promoter  interactions  (ENCODE 
Project  Consortium  2012).  The  ChlA-PET  has  also  been  used  in  an  independent 
study  performed  in  five  human  cell  types  (including  K562)  (Li  et  al.  2012).  This 
study  demonstrated  widespread  promoter-promoter  interactions  between  genes 
transcribed  cooperatively,  as  well  as  cell  type- specific  promoter-enhancer 
interactions  (Li  et  al.  2012). 

2.6    Tissue-Specific  Epigenetic  States  and  Human  Disease 

It  is  increasingly  acknowledged  that  epigenetic  phenomena  may  be  a  crucial 
component  in  the  development  of  human  disease.  The  importance  of  epigenetics 
has  been  clearly  demonstrated  in  monogenic  disorders  involving  imprinted  genes 
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(such  as  Beckwith- Wiedemann,  Prader-Willi,  and  Angelman  syndromes),  in 
single-gene  disorders  of  the  epigenetic  machinery  (such  as  Rett,  ICF,  ATRX,  and 
Rubinstein-Taybi  syndromes)  and  in  cancer  (reviewed  by  Feinberg  2007;  Portela 
and  Esteller  2010).  Since  these  subjects  are  being  presented  in  depth  elsewhere  in 
the  book,  in  this  section  I  discuss  the  existing  evidence  for  tissue-specific  epigenetic 
alterations  in  common  diseases  and  the  link  between  genetic  variants,  tissue- 
specific  epigenetic  patterns,  and  disease. 


2.6.1    Epigenetic  Alterations  in  Common  Human  Diseases 

The  tissue  specificity  of  epigenetic  patterns  makes  it  less  straightforward  to  extrap- 
olate the  epigenetic  information  obtained  in  accessible  samples  such  as  peripheral 
white  blood  or  buccal  cells  to  the  relevant  tissues  involved  in  various  human 
diseases.  Additional  obstacles  in  studying  epigenetic  alterations  in  the  context  of 
human  diseases  are  the  observed  variation  of  epigenetic  marks  between  healthy 
individuals  and  with  advancing  age  (Sandovici  et  al.  2003;  Bjornsson  et  al.  2008). 
Despite  these  important  challenges,  in  the  past  decade  a  number  of  studies  have 
been  able  to  uncover  epigenetic  alterations  in  several  major  forms  of  common 
human  diseases  in  tissues  and  at  loci  that  are  directly  involved  in  the  pathogeneses 
of  the  studied  conditions. 

By  far,  the  most  studied  epigenetic  mark  until  now  was  DNA  methylation,  fact 
explained  at  least  in  part  by  the  ease  of  obtaining  DNA  samples  compared  with 
good-quality  chromatin,  as  well  as  by  the  robustness  and  relatively  low  cost  of  new 
microarray-based  technologies.  For  example,  a  study  performed  in  monozygous 
twins  (MZ)  discordant  for  type  1  diabetes  (T1D)  using  Illumina  Infinium  27  K 
microarrays  to  measure  DNA  methylation  in  CD14+  monocytes  identified  58  CpG 
sites  hypermethylated  and  78  hypomethylated  in  the  TID-affected  co-twins 
(Rakyan  et  al.  2011).  Using  the  same  technology  to  measure  DNA  methylation  in 
the  CD4+  T  lymphocytes  from  patients  with  systemic  lupus  erythematosus  and 
controls  236  hypomethylated  and  105  hypermethylated  CpG  sites  were  identified 
(Jeffries  et  al.  2011).  Another  study  using  Illumina  Infinium  27  K  microarrays 
identified  276  CpG  loci  affiliated  to  promoters  of  254  genes  displaying  significant 
differential  DNA  methylation  in  islets  from  type  2  diabetes  (T2D)  patients  compared 
with  controls,  244  of  which  were  hypomethylated.  These  methylation  changes 
affected  many  genes  implicated  in  p-cell  survival  and  function,  were  absent  in 
blood  cells  from  T2D  individuals,  and  could  not  be  induced  experimentally  in 
nondiabetic  islets  exposed  to  high  glucose  (Volkmar  et  al.  2012).  As  a  last  example, 
a  study  performed  in  the  frontal  cortex  of  patients  with  schizophrenia  or  bipolar 
disorder  versus  controls  using  CpG-island  microarrays  identified  DNA  methylation 
differences  at  dozens  of  loci,  including  several  involved  in  glutamatergic  and 
GABAergic  neurotransmission,  brain  development,  and  other  processes  function- 
ally linked  to  these  diseases  (Mill  et  al.  2008). 

A  variety  of  histone  marks  associated  with  transcriptional  activation  or  repres- 
sion have  been  studied  in  several  common  diseases  in  a  tissue-specific  context. 
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For  example,  in  human  prefrontal  cortexes  from  patients  with  schizophrenia,  the 
decreased  GAD1  (glutamate  decarboxylase  1  [brain,  67  kDa])  expression  compared 
to  controls  was  associated  with  decreased  levels  of  promoter  H3K4me3,  especially 
in  females  (Huang  et  al.  2007).  Altered  levels  of  the  repressive  histone  mark 
H3K9me2  have  been  found  at  many  loci  implicated  in  autoimmunity  and  inflam- 
mation in  lymphocytes  collected  from  T1D  patients  versus  controls  (Miao 
et  al.  2008).  Marked  differences  in  H3K9Ac  levels  were  found  at  the  upstream 
regions  of  HLA-DRB1  (major  histocompatibility  complex,  class  II,  DR  beta  1)  and 
HLA-DQB1  (major  histocompatibility  complex,  class  II,  DQ  beta  1)  genes  in  T1D 
monocytes  relative  to  controls  (Miao  et  al.  2012).  Differential  distribution  of 
H3K4me3  and  H3K9me3  peaks  across  the  genome  has  been  identified  in 
cardiomyocytes  collected  from  patients  with  heart  failure  caused  by  dilated  cardio- 
myopathy compared  to  controls  and  many  disease-dependent  clusters  contained 
genes  implicated  in  signal  transduction  pathways  for  cardiac  function  (Kaneda 
et  al.  2009). 

A  number  of  recent  studies  have  also  started  to  identify  important  roles  for 
tissue-specific  epigenators  in  the  pathogenesis  of  several  common  human 
diseases.  One  illustrative  example  was  recently  published  in  human  pancreatic 
islets.  A  comprehensive  strand- specific  transcriptome  analysis  identified  1,128 
IncRNAs,  many  of  which  are  cell  specific  and  linked  with  p-cell  differentiation 
and  maturation  programs.  Using  a  gene  candidate  approach,  several  of  these 
genes  were  found  abnormally  expressed  in  samples  collected  from  T2D  patients 
(Moran  et  al.  2012). 


2.62    Genetic  Variants,  Tissue-Specific  Epigenetic  Patterns, 
and  Human  Disease 

The  convergence  between  disease-associated  genetic  variants  emerging  from 
GWAS  and  epigenetic  maps  led  to  the  remarkable  observation  that  many  single- 
nucleotide  polymorphisms  (SNPs)  linked  with  various  diseases  are  located  at 
regulatory  DNA  sequences.  For  example,  rs7903146,  a  TCF7L2  (transcription 
factor  7-like  2  [T-cell  specific,  HMG-box])  intronic  variant  strongly  associated 
with  T2D  was  found  to  be  located  in  an  islet-selective  open  chromatin  region  that 
exhibits  enhancer  activity.  Human  islets  heterozygous  for  rs7903146  showed  allelic 
imbalance  in  the  local  chromatin  organization  and  altered  enhancer  activity 
(Gaulton  et  al.  2010).  Additionally,  a  systematic  analysis  of  chromatin-state 
dynamics  in  several  human  cell  types,  which  identified  cell  type-specific  enhancers, 
found  that  top-scoring  disease-associated  SNPs  are  frequently  positioned  within 
enhancer  regions  specifically  active  in  the  relevant  cell  types.  Accordingly,  SNPs 
associated  with  erythrocyte  phenotypes  are  located  in  enhancers  specific  to 
erythroleukemia  cells  (K562),  SNPs  associated  with  systemic  lupus  erythematosus 
are  located  in  enhancers  specific  to  lymphoblastoid  cells  (GM 12878),  while  SNPs 
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associated  with  triglyceride  and  total  lipid  levels  in  blood  are  located  in  enhancers 
specific  to  hepatocellular  carcinoma  cells  (HepG2)  (Ernst  et  al.  2011). 

Building  on  these  initial  observations,  the  recent  data  published  by  the  ENCODE 
consortium  demonstrated  unequivocally  that  over  a  third  of  the  disease-associated 
genetic  variants  emerged  from  the  GWASs  performed  so  far  localize  within 
regulatory  DNA  elements  marked  by  the  presence  of  DHSs  (ENCODE  Project 
Consortium  2012;  Maurano  et  al.  2012;  Schaub  et  al.  2012).  Beyond  this  statisti- 
cally significant  concentration  of  disease-associated  genetic  variants  around  regu- 
latory DNA  elements,  the  systematic  analysis  of  a  large  number  of  cell  and  tissue 
types  led  to  several  additional  striking  observations.  First  of  all,  it  was  observed  that 
genetic  variants  associated  with  a  certain  disease  are  particularly  enriched  at  DHSs 
that  are  active  in  the  cell  types  implicated  in  its  pathogeny.  Examples  include  the 
enrichment  of  genetic  variants  associated  with  Crohn's  disease  at  DHSs  active  in 
T  cells  (subtypes  TH17  and  TH1)  and  the  enrichment  of  SNPs  associated 
with  multiple  sclerosis  at  DHSs  active  in  CD3+  T  cells  from  cord  blood  and 
CD19+/CD20+  B  cells  (Maurano  et  al.  2012).  In  many  cases  common  SNPs 
associated  with  specific  diseases  are  located  at  binding  sites  for  key  transcription 
factors  and,  as  a  result,  their  presence  induces  allelic  imbalances  of  chromatin  states 
(Gaulton  et  al.  2010;  Maurano  et  al.  2012).  Additionally,  the  disease-associated 
SNPs  can  also  alter  tissue-specific  enhancer-promoter  interactions  (Li  et  al.  2012; 
Maurano  et  al.  2012).  More  than  80  %  of  DHSs  containing  disease-associated  SNPs 
are  active  in  fetal  cells  and  tissues,  with  the  greatest  enrichment  for  SNPs  linked 
with  phenotypes  for  which  gestation  or  early  growth  have  been  shown  to  play  major 
roles  (such  as  cardiovascular  disease)  and  with  a  relative  depletion  for  SNPs  linked 
with  aging-related  diseases  (Maurano  et  al.  2012).  This  finding  is  in  agreement  with 
the  recurring  theme  of  chromatin  landscape  plasticity  during  early  development  and 
the  risk  for  specific  common  adult  diseases  (Sandovici  et  al.  2008,  2011). 


2.7    Concluding  Remarks  and  Future  Perspectives 

The  recent  epigenomics  studies  have  began  to  uncover  in  great  details  the  epige- 
netic  landscape  of  pluripotent  stem  cells  and  the  transitions  that  occur  during  cell 
differentiation.  Despite  the  advances  made  by  these  studies,  our  understanding  of 
the  functional  role  of  particular  epigenetic  mechanisms  remains  relatively  poor. 
This  limitation  highlights  the  need  for  more  mechanistic  studies.  It  became  appar- 
ent that  some  of  the  epigenetic  features  discovered  by  studying  pluripotent  stem 
cells  are  acquired  during  their  culture  in  vitro  and  may  not  reflect  the  patterns 
existing  in  vivo.  For  example,  ES  cells  cultured  in  defined  medium  with  inhibitors 
of  two  kinases  (MEK  [MAP  kinase/ERK  kinase]  and  GSK3B  [glycogen  synthase 
kinase  3  beta]),  a  condition  known  as  "2i,"  postulated  to  establish  a  naive  ground 
state,  have  reduced  prevalence  of  bivalent  domains,  despite  similar  differentiation 
potential  with  serum-grown  ES  cells  (Marks  et  al.  2012).  Additionally,  genome- 
wide  DNA  methylation  profiling  of  a  large  collection  of  human  pluripotent  stem 
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cell  lines  has  revealed  many  aberrations  of  this  epigenetic  mark,  which  were 
specific  to  the  culture  conditions  used  (Nazor  et  al.  2012).  Currently  it  is  still 
difficult  to  investigate  lineage  specification  and  the  associated  establishment  of 
global  epigenetic  patterns  for  many  cell  types  without  expanding  them  in  vitro. 
However,  as  the  number  of  cells  required  for  epigenetic  analyses  continues  to 
decrease,  it  is  likely  that  these  exciting  studies  will  become  possible  in  the  near 
future. 

The  recent  completion  of  the  ENCODE  project  represents  a  milestone  achieve- 
ment for  our  better  understanding  of  the  human  genome  and  epigenome.  However, 
the  information  obtained  is  still  not  comprehensive.  For  example,  just  1 1  of  more 
than  60  known  histone  modifications  were  analyzed  in  only  46  out  of  147  cell  types 
included  in  ENCODE  and  the  real  number  of  histone  modifications  may  be  even 
larger  (Tan  et  al.  2011).  Most  of  the  other  assays  were  performed  only  in  small 
subsets  of  cell  types,  suggesting  that  the  data  obtained  so  far  may  represent  only  a 
fraction  of  the  potential  functional  information  encoded  in  the  human  genome.  An 
important  future  goal  is  therefore  to  complete  these  gaps,  for  example  by  current 
complementing  international  projects  such  as  the  NIH  Roadmap  Epigenomics 
Mapping  Consortium  (Bernstein  et  al.  2010),  Alliance  for  the  Human  Epigenome 
and  Disease  (AHEAD)  (Jones  et  al.  2008),  and  BLUEPRINT  (Adams  et  al.  2012). 
Important  continuations  of  the  original  ENCODE  project  are  also  provided  by  the 
modENCODE  project,  set  to  identify  functional  elements  in  selected  model 
organisms  (Drosophila  melanogaster  and  Caenorhabditis  elegans)  (Celniker 
et  al.  2009)  and  Mouse  ENCODE  project  (Mouse  ENCODE  Consortium  2012). 
A  better  integration  between  ENCODE  and  GWAS  data  is  likely  to  have  a  signifi- 
cant impact  on  our  understanding  of  common  human  diseases.  This  will  be  further 
enhanced  by  the  recent  completion  of  the  1000  Genomes  project  (The  1000 
Genomes  Project  Consortium  2012). 
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Chapter  3 

X-Chromosome  Inactivation 


Wendy  P.  Robinson,  Allison  M.  Cotton,  Maria  S.  Penaherrera, 
Samantha  B.  Peeters,  and  Carolyn  J.  Brown 


Abstract  The  dimorphism  of  the  sex  chromosomes  has  led  to  some  of  the  most 
dramatic  epigenetic  phenomena  known  in  order  to  achieve  dosage  compensation 
between  the  sexes.  Mammalian  X-chromosome  inactivation  (XCI)  requires  the 
differential  treatment  of  two  essentially  identical  chromosomes  in  the  same  nuclear 
environment.  XCI  has  thus  been  the  subject  of  considerable  study  in  mouse  as  a 
paradigm  for  epigenetic  choice,  although  less  is  known  about  the  timing  and  initial 
events  of  XCI  in  humans  and  other  species.  XCI,  as  can  be  visualized  by  the  spots 
on  a  calico  cat,  is  also  a  dramatic  example  of  the  stability  of  epigenetic  silencing, 
since  the  inactivation  state  of  an  X  chromosome  (X)  is  faithfully  inherited  through 
subsequent  somatic  cell  divisions.  Studies  to  understand  the  layering  of  epigenetic 
modifications  that  result  in  such  stable  silencing  have  been  reviewed  elsewhere,  and 
in  this  review  we  focus  instead  on  the  translation  of  our  growing  understanding  of 
the  epigenetic  phenomena  of  XCI  to  human  disease. 

X-linked  disease  is  epitomized  by  an  excess  of  affected  males,  but  the  charac- 
terization as  dominant  or  recessive  belies  the  complexity  of  the  contribution  of  XCI. 
Notably,  whether  or  not  X-linked  disease  is  apparent  in  females  is  considerably 
impacted  by  the  extent  of  skewing  of  XCI  in  the  individual.  Furthermore,  the 
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unique  biology  of  the  sex  chromosomes  impacts  the  likelihood  of  X-linked  disease 
in  females  due  to  de  novo  mutation  rates.  In  addition,  XCI  does  not  result  in 
complete  dosage  equivalence  between  males  and  females;  however,  the  extent  to 
which  there  are  sex,  and  even  interindividual  differences  due  to  XCI  has  not  yet 
been  well  elucidated.  Overall,  while  it  is  the  Y  chromosome  (Y)  that  determines 
sex,  the  X  contributes  in  a  complex  fashion  to  the  sex  differences  in  disease 
frequency  and  severity. 


3.1    Mosaicism  and  Skewing  in  XCI 


A  consequence  of  X-chromosome  inactivation  (XCI)  is  that  females  are  mosaic  for 
the  expression  of  the  X-linked  genes  subject  to  inactivation.  This  confounds  the 
classic  concept  of  "dominant"  or  "recessive"  inheritance,  as  the  ability  of  a  normal 
allele  to  compensate  for  a  mutation-carrying  allele  will  depend  on  whether  the  gene 
product  functions  cell  autonomously,  as  well  as  on  the  interactions  between  normal 
and  mutation-expressing  cells  and  their  products.  Furthermore,  despite  XCI  being  a 
random  process,  most  tissue  samples  will  show  some  detectable  degree  of  bias 
(skewing)  towards  inactivation  of  either  the  maternal  or  paternal  X.  The  proportion 
of  cells  expressing  the  mutant  copy  can  vary  from  0  to  100  %,  and  this  proportion  can 
vary  between  tissues  sampled  and  between  sites  within  a  tissue.  Skewed  expression  of 
one  or  the  other  gene  copy  can  occur  due  to:  biased  initial  inactivation;  restricted 
precursor  cell  pool  size  at  the  time  of  XCI  or  during  early  development;  and/or  active 
selection  (see  Fig.  3.1).  It  is  important  to  understand  the  dynamics  of  random  and 
nonrandom  XCI  during  development  to  be  able  to  interpret  the  potential  phenotype  of 
an  X-linked  mutation  in  a  female  carrier.  Assessment  of  XCI  skewing  in  female 
carriers  of  an  X-linked  mutation  can  be  useful  in  the  clinical  setting  in  certain 
situations,  but  needs  to  be  interpreted  with  considerable  caution. 


3.1.1    Causes  of  Skewing  of  XCI 

3.1.1.1    Skewing  Related  to  Choice  or  Biased  Initial  Inactivation 

A  primary  bias  in  the  choice  of  the  X  to  remain  active  could  arise  due  to  mutations 
in  the  X-inactivation  center,  including  the  X-inactive  specific  transcript  (XIST) 
involved  in  the  initiation  of  XCI  (see  Minks  et  al.  2008).  While  primary  skewing 
of  XCI  is  observed  in  mouse  (Cattanach  et  al.  1969),  in  humans  it  has  been  difficult 
to  discriminate  this  mechanism  from  early  selection  against  cells  inactivating  one  or 
the  other  X  due  to  mutation.  Rare  mutations  in  the  XIST  minimal  promoter  (Plenge 
et  al.  1997;  Tomkins  et  al.  2002)  have  been  associated  with  skewed  XCI  or  failure 
to  inactivate  the  mutated  X,  and  also  have  been  reported  to  impact  the  binding 
ability  of  the  boundary  factor  CCCTC-binding  factor  (zinc  finger  protein)  CTCF 


3    X-Chromosome  Inactivation 


65 


Random  XCI 


Choice 


Pool  Size 

Many  non-viable  celis 
reduction  in  eff eethre 
precursor  pool 


Selection 

Selective  ceil  growth 
depends  on  al  leles  on 

active  X 


k  :', pes  may  L>iJ % 
which  X  becomes  inactive 


Variability  between  Females 


Preturso*  pool  size  for 
each  tissue  may  vary 


The  more  developments  lly 
related  cells  are,  the  more  likely 
to  show  similar  shewing  ratios 


Skewing  can  increase  over  time 
particularly  in  continually  dividing 
tissu  es  such  5  s  Wood 


Variability  within  females 


Patch  size  relative  to  sample 
size  influences  skewing 
measurement 


Fig.  3.1  The  origins  of  skewed  XCI  and  sources  of  variability  within  and  between  females. 
Females  may  show  nonrandom  XCI  due  to  a  primary  bias  in  choice  of  X  to  inactivate;  due  to  a 
secondary  growth  advantage  of  one  population;  or  due  to  a  reduction  in  the  size  of  the  pool  of  cells 
contributing  to  the  embryo.  Within  any  female  there  can  be  further  variability  in  XCI  skewing 
between  tissues,  with  age,  or  due  to  sampling 


(Pugacheva  et  al.  2005),  although  genome- wide  surveys  have  not  shown  CTCF 
enrichment  at  this  location  in  somatic  cells.  Such  mutations  were  not  observed  in 
any  of  66  women  presenting  with  skewed  XCI  of  unknown  etiology  (Pereira  and 
Zata  1999),  so  are  not  a  common  cause  of  skewing.  A  pedigree  with  unexpected 
hemophilia  A  expression  in  females  showed  linkage  to  a  region  within  Xq25  that 
overlapped  similar  regions  identified  in  previous  studies  (Cau  et  al.  2006;  Naumova 
et  al.  1998).  Within  this  region,  stromal  antigen  2  (STAG2)  was  identified  as  a 
candidate  gene  for  a  role  in  XCI  choice  in  humans  due  to  its  function  as  part  of  the 
cohesin  complex  and  interaction  with  CTCF  (Renault  et  al.  2011).  Overall,  it 
remains  to  be  determined  whether  there  are  variants  affecting  the  initial  choice  of 
X  to  inactivate  in  humans. 


3.1.1.2    Skewing  Due  to  Selection 

Selection  as  a  mechanism  for  nonrandom  XCI  is  based  on  differential  cellular 
growth  or  cell  survival  following  random  XCI.  The  clearest  examples  are  provided 
by  structural  chromosomal  abnormalities  in  which  preferential  survival  of  those 
cells  with  the  abnormal  X  inactivated  is  observed.  In  balanced  X; autosome 
translocations  (t(X;A))  selection  typically  favors  those  cells  where  the  normal 
X  is  inactive.  In  contrast,  unbalanced  t(X;A)  typically  show  preferential 
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inactivation  of  the  t(X;A)  (reviewed  in  Leppig  and  Disteche  2001).  The  degree  of 
skewing  depends  on  the  timing  and  efficiency  of  selection;  however,  the  end  result 
generally  favors  the  balanced  expression  of  autosomal  and  X-linked  genes  (Leppig 
and  Disteche  2001;  Schluth  et  al.  2007). 

Although  selection  is  an  attractive  mechanism  to  explain  skewed  XCI,  most 
women  found  to  have  extremely  skewed  XCI  have  normal  chromosomes.  Among 
45  women  with  skewed  XCI  (>80  %)  there  was  not  an  increase  in  copy  number 
variants  when  screened  by  high  resolution  array  for  cryptic  X  deletions  and 
duplications  (Jobanputra  et  al.  2012).  Only  a  single  5. 5 -Mb  deletion  was  identified 
in  one  female  with  skewed  XCI,  suggesting  that  cryptic  chromosomal  abnormalities 
may  only  rarely  account  for  skewing. 

Mutations  within  X-linked  genes  can  also  contribute  to  skewed  XCI.  While 
random  XCI  is  generally  observed  in  female  carriers  for  most  X-linked  gene 
disorders,  there  are  conditions  where  the  mutation  results  in  skewed  XCI  with 
preferential  expression  of  the  normal  allele  of  the  gene  or,  more  rarely  the  mutant 
allele,  as  is  seen  for  adrenoleukodystrophy  (reviewed  in  Orstavik  2009;  Salsano 
et  al.  2012).  Depending  on  the  mutation,  skewing  can  occur  shortly  after  XCI  and 
thus  be  constitutional,  or  may  be  limited  to  specific  tissues  in  which  the  gene  is 
expressed.  For  example,  mutations  in  forkhead  box  P3  (FOXP3)  cause  dysfunc- 
tional T-cells  leading  to  immune  dysregulation  polyendocrinopathy  enteropathy 
(IPEX  syndrome)  in  males,  yet  carrier  women  are  usually  unaffected.  Skewed  XCI 
in  carrier  females  is  limited  specifically  to  the  CD4+CD25hl  regulatory  T-cells, 
despite  random  XCI  being  observed  in  naive  and  memory  CD4+  T-cells  (Di  Nunzio 
et  al.  2009).  Therefore,  evaluating  skewing  of  XCI  in  a  sample  of  peripheral  blood 
lymphocytes  from  such  cases  will  not  be  informative  for  identifying  the  presence  of 
skewing  in  the  clinically  relevant  subset  of  cells  that  are  responsible  for  the  clinical 
manifestations  of  disease. 


3.1.1.3    Skewing  Related  to  Precursor  Pool  Size 

In  the  absence  of  selection,  the  distribution  of  skewing  in  a  population  of  tissue 
samples  depends  on  (1)  the  number  of  cells  present  in  the  precursor  pool  at  the  time 
an  X  is  initially  marked  to  be  inactivated,  (2)  the  number  of  cells  from  this  precursor 
pool  which  contribute  to  the  development  of  that  particular  tissue,  and  (3)  the 
number  of  cell  descendants  which  remain  closely  associated  (patch  size)  in  the 
tissue  relative  to  the  sample  size  analyzed.  The  skewing  values  in  a  population  of 
samples  from  blood  in  newborns  are  largely  normally  distributed  and  consistent 
with  derivation  from  a  precursor  pool  of  less  than  16  cells  (Amos-Landgraf 
et  al.  2006).  Similarly,  the  number  of  precursors  contributing  to  blood  was 
estimated  at  4-5  precursors,  whereas  the  less  extreme  XCI  values  in  buccal  samples 
suggested  a  precursor  pool  size  of  16  cells  (Monteiro  et  al.  1998).  These  numbers 
assume  biases  result  only  from  stochastic  forces  and  may  be  underestimates  if 
selection  plays  a  significant  role. 


3    X-Chromosome  Inactivation  67 


Percent  of  women  exhibiting  different  degrees  of  skewing  by  age 


50-59%  60-69%  70-79%  80-89%  90-100% 


degree  of  skewing 

Fig.  3.2  The  effect  of  age  on  extent  of  skewed  XCI  in  blood.  Older  females  show  more 
nonrandom  patterns  of  XCI.  N  =  68  for  0-19  years;  N  =  146  for  20-39  years;  N  —  121  for 
40-59  years;  N  =  SO  for  60-89  years 

Precursor  pool  size  may  vary  between  individuals,  and  in  some  situations  the 
embryo  may  derive  from  a  highly  reduced  number  of  these  precursors.  For  exam- 
ple, skewed  XCI  is  frequently  seen  in  newborns  from  pregnancies  complicated  by 
confined  placental  mosaicism,  a  situation  where  significant  levels  of  trisomy  are 
found  exclusively  in  the  placenta.  Skewed  XCI  is  presumably  the  result  of  trisomic 
cells  being  present  in  the  embryonic  precursor  pool  at  the  time  of  XCI  that  do  not 
contribute  significantly  to  the  differentiated  embryo  (Lau  et  al.  1997).  Similarly, 
completely  skewed  XCI  is  increased  in  Prader-Willi  syndrome  patients  with  unipa- 
rental disomy,  a  situation  thought  to  frequently  arise  in  association  with  confined 
placental  trisomy  (Lau  et  al.  1997;  Butler  et  al.  2007). 


3.1.1.4    Skewing  Increases  with  Age 

Numerous  studies  have  shown  that  skewing  gradually  increases  with  age  in  samples 
of  peripheral  blood  (Busque  et  al.  1996;  Sharp  et  al.  2000;  Hatakeyama  et  al.  2004; 
Knudsen  et  al.  2007;  Bolduc  et  al.  2008).  Based  on  our  own  data  for  415  healthy 
females,  the  proportion  of  extreme  skewing  (>90  %)  is  less  than  10  %  at  younger 
ages  but  nearly  25  %  in  women  over  60  years  of  age  (Fig.  3.2).  The  distribution  of 
skewing  varies  from  study  to  study  and  may  be  subject  to  lab  and  assay-based 
differences  (methodologies  are  discussed  below).  Over  a  2-year  period  the  level  of 
skewing  in  peripheral  blood  samples  is  remarkably  stable  at  all  ages  (van  Dijk 
et  al.  2002)  likely  reflecting  the  stability  of  hematopoietic  stem  cell  populations 
over  limited  time  frames.  It  is  probable  that  the  increase  in  skewing  over  time  is  a 
combination  of  a  slight  proliferative  advantage  of  cells  that  have  inactivated  one  X 
over  the  other  as  well  as  random  stochastic  loss  of  stem  cell  precursors  (Christensen 
et  al.  2000).  Interestingly,  neutrophils  showed  a  greater  degree  of  skewing  (>90  %) 
in  elderly  women  than  did  T-cells  (33  %  vs.  9  %)  (Gale  and  Linch  1998). 


68 


W.P.  Robinson  et  al. 


Blood  is  typically  used  to  assess  skewing  as  it  is  more  readily  accessible  than 
other  tissue  sources;  however,  caution  is  needed  as  the  skewing  in  the  tissue  of 
interest  to  a  specific  disease  process  may  differ  from  blood  (see  Fig.  3.1).  Skewing 
in  different  somatic  tissues  is  correlated  (Sharp  et  al.  2000;  Bolduc  et  al.  2008; 
Bittel  et  al.  2008),  but  the  proportion  of  samples  showing  moderate  or  extreme 
skewing  tends  to  be  less  in  skin  and  muscle  (Gale  et  al.  1994)  or  buccal  cells 
(Bolduc  et  al.  2008).  Furthermore,  this  correlation  decreases  with  age,  suggesting 
either  differences  in  selection  or  stochastic  processes  affecting  skewing  ratios  over 
time  in  different  tissues. 


3.1.1.5  Skewing  in  Twins 

There  is  little  evidence  that  monozygotic  (MZ)  twin  pairs  show  skewed  XCI  more 
often  than  dizygotic  twins  (Monteiro  et  al.  1998).  There  is,  however,  a  significant 
correlation  in  degree  and  direction  of  XCI  skewing  between  MZ  twin  pairs  depen- 
dent on  the  developmental  timing  of  twinning.  Monoamniotic,  monochorionic  twin 
pairs  display  the  highest  correlation  in  skewing  (Monteiro  et  al.  1998;  Chitnis 
et  al.  1999)  reflecting  their  relatively  later  derivation  from  a  common  pool  of 
cells  after  the  commitment  to  XCI.  Nonetheless,  there  can  be  substantial  differences 
between  XCI  skewing  in  monozygotic  twins,  particularly  dichorionic  MZ  pairs,  and 
this  may  account  for  some  phenotypic  discordance  in  otherwise  genetically  identi- 
cal pairs.  MZ  twin  discordance  in  the  presentation  of  X-linked  disorders  such  as 
Fragile  X  syndrome  and  Duchenne  muscular  dystrophy  (DMD)  (Richards 
et  al.  1990)  has  been  attributed  to  differing  XCI  ratios. 

3.1.1.6  Is  Skewing  Heritable? 

Since  skewing  can  reflect  all  the  processes  discussed  above  (and  outlined  in 
Fig.  3.1);  it  is  challenging  to  determine  whether  skewed  XCI  is  heritable  in  humans. 
Clearly  transmission  of  X  rearrangements  or  mutations  that  are  accompanied  by 
skewed  XCI  will  lead  to  familial  skewed  XCI,  but  evidence  for  heritability  in  the 
general  population  is  limited.  In  a  study  of  38  control  families,  there  was  a 
correlation  in  skewed  XCI  between  sister-sister  pairs  but  not  in  mother-daughter 
pairs  (Naumova  et  al.  1998).  A  study  of  over  500  mother-neonate  pairs  also  showed 
no  correlation  between  degree  of  skewing  or  incidence  of  extreme  skewing  (Bolduc 
et  al.  2008),  suggesting  that  genetics  has  little  influence  on  degree  and  direction  of 
skewing  in  the  general  population. 

A  separate  question  is  whether  increased  skewing  of  XCI  with  age  is  a 
heritable  trait.  Such  a  tendency  could  explain  the  correlation  in  skewing  between 
adult  siblings  but  not  when  newborns  are  compared  to  older  relatives.  The 
concordance  in  skewing  between  elderly  twin  pairs  was  used  to  infer  that 
acquired  skewing  with  age  results  from  a  small  selective  advantage  of  cells 
with  one  or  the  other  X  active,  while  stochastic  events  and  depletion  of  the 
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stem  cell  pool  may  contribute  to  acquired  skewing  to  some  smaller  degree 
(Kristiansen  et  al.  2005;  Vickers  et  al.  2001).  Interestingly,  the  offspring 
(aged  -60-80)  of  centenarians  exhibited  less  XCI  skewing  than  comparably 
aged  females  of  non-long-lived  parents  (Gentilini  et  al.  2012),  possibly  due  to 
reduced  turnover  of  the  hematopoietic  stem  cell  pool,  as  they  also  have  a  lower 
incidence  of  disease.  While  intriguing,  the  study  sample  was  relatively  small  and 
the  cause  of  this  association  remains  to  be  shown.  Overall,  it  appears  that  the 
initial  skewing  ratios  observed  in  newborns  are  largely  determined  by  stochastic 
effects,  but  that  over  time  subtle  genetic  differences  can  affect  both  the  degree 
and  direction  of  skewing,  at  least  in  blood. 


3.1.2    Skewing  of  XCI:  Impact  on  Disease 

Most  sex-linked  disorders  cannot  readily  be  classified  as  dominant  or  recessive 
(see  also  Dobyns  et  al.  2004),  since  males  are  hemizygous,  and  female 
heterozygotes  are  generally  mosaics  of  two  different  cell  populations  due  to  XCI. 
While  it  is  difficult  to  classify  X-linked  disorders  into  clear  patterns  of  inheritance  it 
is  perhaps  useful  to  consider  the  following  (see  Fig.  3.3):  (1)  recessive-like,  in 
which  females  are  generally  not  affected;  (2)  dominant-like,  in  which  females 
generally  are  affected;  and  (3)  disease  caused  by  mosaic  XCI  in  females  with 
males  generally  unaffected. 

3.1.2.1    Recessive-Like:  Males  Affected  and  Generally 
Unaffected  Females 

Heterozygous  females  may  not  manifest  an  X-linked  disorder  when  they  are  near 
random  in  their  XCI  pattern  because  the  mosaic  expression  of  the  normal  allele  is 
sufficient  to  prevent  disease.  An  example  is  the  metabolic  cooperation  between 
cells  with  the  active  normal  or  mutant  allele  for  mucoplysaccharidosis  type  II 
(Hunter  syndrome),  in  which  the  deficiency  is  corrected  via  cell  to  cell  transfer 
(Migeon  2006,  and  see  Orstavik  2009  for  additional  examples).  In  other  disorders 
females  are  unaffected  because  of  a  tendency  towards  a  skewed  pattern  of  XCI 
favoring  the  normal  allele  being  kept  active.  Such  selection  may  be  evident  in 
several  different  tissues  as  seen  for  alpha-thalassemia  X-linked  intellectual  disabil- 
ity syndrome  (ATRX)  (Gibbons  et  al.  1992),  or  may  be  tissue  specific  as  in  the 
IPEX  syndrome  discussed  above. 

In  this  category  of  X-linked  disorders,  females  may  sometimes  show  limited 
signs  or  be  affected  with  the  disease  for  a  variety  of  reasons.  First,  45, X  females 
(Turner  syndrome)  can  be  affected  as  the  dosage  of  X-linked  genes  is  equivalent  to 
that  of  hemizygous  affected  males,  as  has  been  observed  in  an  affected  45, X  girl 
with  DMD  (Chelly  et  al.  1986).  Second,  unfortunate  lyonization,  or  extreme 
skewing  can  result  in  the  mutant  allele  being  preferentially  active.  Balanced 
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The  effect  of  skewed  XCI  on  phenotype 
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Fig.  3.3  The  effect  of  skewed  XCI  on  phenotype.  For  cell-autonomous  traits,  females  will  be 
mosaics;  however,  the  extent  of  phenotypic  expression  will  depend  on  the  skewing  of  XCI  in  the 
female.  For  most  X-linked  disorders  the  hemizygous  male  will  manifest  the  trait,  while  females 
will  show  the  trait  only  when  heavily  skewed  (recessive-like)  or  unless  favorably  skewed 
(dominant-like).  In  rare  cases  the  presence  of  mosaicism  causes  the  disease  in  which  case  males 
will  be  spared 


t(X;A)s  can  disrupt  a  gene  at  the  site  of  breakpoint  and  also  result  in  skewed  XCI, 
leading  to  manifestation  of  disorders  such  as  DMD  (Boyd  and  Buckle  1986). 
Interestingly,  it  has  been  suggested  that  increased  nonrandom  XCI  with  age  may 
lead  to  loss  of  heterozygosity  and  potentially  drive  the  onset  of  some  X-linked 
disorders  in  elderly  carrier  women.  This  was  suggested  as  a  possible  mechanism 
for  glucose-6-phosphate  dehydrogenase  (G6PD)  deficiency  (Au  et  al.  2006)  or  late 
onset  X-linked  sideroblastic  anemia  (Cazzola  et  al.  2000). 


3.1.2.2    Dominant-Like:  Males  Affected  or  Male  Lethal; 
and  Females  Often  Affected 

Dominant-like  disorders  are  those  in  which  females  and  males  generally  show  signs 
of  disease.  Typically  males  are  more  severely  affected  than  females  and  in  numer- 
ous cases  the  disorder  is  incompatible  with  survival  in  hemizygous  males  such  that 
only  affected  females  are  observed.  In  some  such  disorders,  such  as  focal  dermal 
hypoplasia  (Goltz-Gorlin  syndrome)  (Clements  et  al.  2009),  there  is  surprisingly  no 
clear  correlation  between  XCI  and  phenotype.  Occasionally,  skewing  towards 
inactivation  of  the  mutant  allele,  or  fortunate  lyonization,  will  result  in  unaffected 
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female  carriers  (e.g.,  Rett  syndrome;  Dayer  et  al.  2007),  while  predominant  inacti- 
vation of  the  normal  allele  will  result  in  more  severely  affected  females  (e.g.,  Fabry 
disease  Dobrovolny  et  al.  2005).  Interestingly,  rare  exceptions  to  the  male  lethal 
phenotype  of  some  conditions  have  been  reported.  These  are  often  individuals  with 
supernumerary  Xs  or  having  somatic  mosaicism  for  the  mutation;  reports  of  this 
include  males  with  Rett  syndrome  (Smeets  et  al.  2012),  Aicardi  syndrome  (Hopkins 
et  al.  1979),  and  focal  dermal  hypoplasia  (Wang  et  al.  2007). 


3.1.2.3    Mosaicism  Dependent  Phenotype:  Females 
Affected  More  Severely  than  Males 

Paradoxically,  for  some  X-linked  diseases  heterozygous  females  have  greater 
disease  severity  than  hemizygous  males.  For  craniofrontonasal  syndrome,  which 
is  caused  by  mutations  of  ephrin-Bl  (EFNB1)  a  marker  of  tissue  boundary  forma- 
tion, it  has  been  hypothesized  that  the  patchwork  loss  of  the  gene  expression  in 
females  may  disturb  tissue  boundary  formation  (Twigg  et  al.  2004).  In  hemizygous 
males  an  alternative  mechanism,  perhaps  due  to  the  promiscuity  of  the  ephrin 
ligand/receptor  system,  maintains  the  proper  boundary  formation;  but  in  the  female 
carriers  mosaicism  for  the  mutation  causes  disordered  growth.  The  coexistence  of 
normal  and  mutant  allele  products  is  suggested  to  lead  to  "cellular  interference"  and 
abnormal  phenotype  (Wieacker  and  Wieland  2005).  A  similar  mechanism  has  been 
suggested  for  early  infantile  epileptic  encephalopathy  (Ryan  et  al.  1997).  It  is  worth 
considering  whether  the  presence  of  mosaicism  for  X-linked  polymorphisms  could 
contribute  to  other  disorders  with  female  preponderance.  For  example,  if  there  are 
patches  of  the  brain  that  are  genetically  distinct  by  virtue  of  expressing  one  or  the 
other  X  this  could  affect  the  manner  in  which  those  patches  of  cells  communicate 
with  one  another. 


3.1.2.4    Skewed  XCI  and  Complex  Disorders  for  Which 
the  Cause  Is  Unknown 

Increased  skewing  is  observed  in  some  autoimmune  disorders  affecting  predomi- 
nantly women,  including  scleroderma  (Ozbalkan  et  al.  2005),  autoimmune  thyroid 
disease  (Ozcelik  et  al.  2006),  and  juvenile  arthritis  (Uz  et  al.  2009),  but  not  in  others 
such  as  systemic  lupus  erythematosus.  Intriguingly,  monosomy  X,  which  may  often 
be  mosaic,  is  also  associated  with  autoimmunity,  leading  to  the  suggestion  that  a 
breakdown  of  self-tolerance  may  result  from  the  loss  of  mosaic  XCI  in  T-cells  over 
time  (reviewed  in  Ozcelik  et  al.  2006).  As  the  X  has  a  large  number  of  genes  and 
microRNAs  affecting  immune  function,  mutations  in  a  number  of  which  lead  to 
autoimmune  disease  (Bianchi  et  al.  2012),  loss  of  heterozygosity  for  X-linked 
mutations  or  polymorphisms  that  affect  immune  function  could  also  be  playing  a 
role  in  some  autoimmune  disorders. 
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Patients  exhibiting  skewed  XCI  and  autoimmune  thyroiditis  showed  an 
increased  rate  of  pregnancy  loss  (Ozcelik  et  al.  2006).  Furthermore,  the  X  is 
over-represented  for  genes  involved  in  reproductive  function.  Skewed  XCI  has 
been  associated  with  recurrent  miscarriage  (RM,  generally  defined  as  three  or  more 
consecutive  pregnancy  losses)  in  several  studies  (Lanasa  et  al.  1999;  Uehara 
et  al.  2001;  Beever  et  al.  2003b).  A  similar  association  was  not  found  with  the 
less  strict  criteria  of  two  miscarriages,  not  consecutive,  used  in  two  other  studies 
(Hogge  et  al.  2007;  Warburton  et  al.  2009).  In  one  large  pedigree  ascertained  for 
increased  miscarriage  a  deletion  on  the  X  was  identified  (Pegoraro  et  al.  1998). 
If  mutations  affecting  male  survival  were  a  common  explanation  for  skewed  XCI 
in  women  experiencing  RM  a  preponderance  of  karyotypically  normal  males 
among  the  RM  would  be  anticipated,  which  has  not  been  observed  (Stephenson 
et  al.  2002).  A  meta-analysis  supported  an  association  between  skewed  XCI  and 
RM  (Su  et  al.  201 1),  though  this  association  remains  controversial.  Skewed  XCI  has 
also  been  associated  with  primary  premature  ovarian  failure  (POF)  (Bretherick 
et  al.  2007;  Sato  et  al.  2004).  Skewing  in  these  females  might  reflect  restricted 
embryo  precursor  size  leading  to  compromised  growth  of  germ  cells,  or  the 
presence  of  X-linked  mutations  as  POF  is  associated  with  a  number  of  X 
rearrangements  implicating  several  gene  regions  on  the  X  (Toniolo  and 
Rizzolio  2007). 


3.1.3    Assessment  of  Skewing 

Since  skewing  of  XCI  can  impact  the  phenotype  of  female  carriers,  XCI  assays  are 
widely  used  diagnostic  tools  for  these  X-linked  conditions.  However,  their  reliable 
application  in  clinical  medicine  requires  clear  definitions  of  the  phenotype,  correc- 
tion for  possible  age-related  biases,  corroboration  of  the  phenotype-XCI  skewing 
associations  in  independent  datasets  and  specific  tissues,  awareness  of  the  existing 
limitations  of  information  available  (particularly  for  rare  conditions)  and  the  use  of 
reliable  assays  that  are  in  linkage  disequilibrium  with  the  gene/region  of  interest. 

Numerous  assays  used  for  determining  XCI  patterns  have  been  described.  These 
include  electrophoretic  assays  using  protein  isoforms  (G6PD);  RFLPs  (e.g.,  phos- 
phoglycerate  kinase  1  (PGK1));  or  STR-based  assays  using  PCR  following 
methylation-sensitive  enzyme  digestion  (e.g.,  androgen  receptor  (AR),  fragile  X 
mental  retardation  1  (FMR1),  monoamine  oxidase  A  {MAO A),  zinc  finger 
MYM-type  3  (ZNF261),  zinc  finger  DHHC-type  containing  15  (ZDHHC15),  SLIT 
and  NTRK-like  family  member  4  (SLITRK4)  and  proprotein  convertase  subtilisin/ 
kexin  type  1  inhibitor  (PCSK1N)).  Each  is  challenged  by  technical  issues  such  as 
incomplete  digestion,  microsatellite  amplification  stutter  and  biases  that  complicate 
the  quantification  of  the  products  and  can  result  in  discordance  between  their  results 
(reviewed  in  Beever  et  al.  2003a;  Bertelsen  et  al.  201 1). 

The  most  widely  used  assay  in  clinical  practice  examines  DNA  methylation  at 
the  AR  gene.  This  assay  is  informative  in  >90  %  of  females  and  evaluates  the 
methylation  pattern  of  two  Hpall  restriction  sites  adjacent  to  a  polymorphic  CAG 
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repeat  in  the  1st  exon  of  AR  in  Xql2  (Allen  et  al.  1992).  Restriction  enzyme 
independent  assays  have  been  designed  to  interrogate  the  same  region  using 
methylation-specific  PCR  (M-PCR)  following  sodium  bisulfite  modification 
(Kubota  et  al.  1999).  An  expression-based  AR  assay  that  measures  allelic  expres- 
sion has  been  shown  to  largely  correlate  with  the  methylation-based  assay  (Busque 
et  al.  1994)  and  with  other  transcription-based  assays  (Bolduc  et  al.  2008).  Addi- 
tional assays  based  on  the  quantitative  expression  of  polymorphic  X-linked  genes, 
also  known  as  transcriptional  clonality  assays  (qTCAs),  have  been  developed  but 
require  access  to  RNA.  While  some  studies  have  shown  highly  concordant  results 
(Busque  et  al.  2009),  there  are  reports  of  discordance  between  qTCAs  and  AR 
(methylation  and  expression)  results  that  have  been  attributed  to  highly  variable 
methylation  of  the  AR  gene  in  granulocytes  (Swierczek  et  al.  2012).  Discordance 
between  assays  is  a  concern  that  needs  to  be  addressed,  as  it  is  yet  not  clear  which 
assay  provides  the  most  accurate  reflection  of  XCI  patterns. 

An  important  final  consideration  is  that  tissue  specificity  of  XCI  patterns  (Gale 
et  al.  1994)  limits  the  conclusions  that  can  be  established  based  on  the  assessment  of 
a  single  tissue,  highlighting  the  need  for  appropriate  tissue  and  age-matched 
controls  in  all  XCI  studies.  In  clinical  practice,  skewing  is  typically  measured  in 
peripheral  blood  for  its  accessibility;  however,  this  tissue  is  not  necessarily  the  most 
informative  and  relevant  in  certain  conditions. 


3.1.4    Skewed  XCI  to  Assess  Clonal  Patch  Size 

XCI  has  long  been  used  to  assess  clonality  in  tumors  and  other  types  of  lesions; 
however,  this  analysis  can  be  complicated  by  several  considerations  (Chen  and 
Prchal  2007).  The  tumor  may  have  incorporated  cells  of  differing  origins,  such  as 
vascular  or  inflammatory  cells,  and  thus  appear  to  be  polyclonal.  Alternately,  there 
may  be  a  subsequent  expansion  of  a  dominant  clone  even  if  the  lesion  was 
originally  polyclonal  in  origin.  Furthermore,  due  to  the  stability  of  XCI  in  cells 
descended  from  a  common  precursor,  there  is  a  natural  patchiness  to  XCI  patterns; 
thus  an  important  consideration  for  clonality  analysis  in  tumors  is  the  patch  size  of 
the  cells  from  which  such  lesions/tumors  may  arise.  For  example,  a  neoplastic 
origin  of  atherosclerotic  lesions  has  been  argued  based  on  the  observation  of 
nonrandom  XCI  in  the  lesion.  However,  studies  of  XCI  suggest  that  human  arteries 
grow  by  expansion  of  smooth  muscle  cell  clones  with  little  mixing  with  adjacent 
clones,  resulting  in  a  patch  length  often  >4  mm,  and  suggesting  that  plaques  may 
simply  arise  from  preexisting  developmentally  normal  clones  of  cells  (Chung 
et  al.  1998). 

Assays  of  XCI  skewing  estimate  the  patch  size  of  epidermis  to  reflect  an 
estimated  20-350  basal  cells  (Asplund  et  al.  2001).  This  implies  a  much  finer 
scale  substructure  than  reflected  by  the  hyper-  and  hypo-pigmented  skin  patches 
seen  to  follow  the  lines  of  Blaschko  in  X-linked  incontinentia  pigmenti  that  had 
been  proposed  to  represent  XCI  clonal  patterns.  Utilizing  XCI  to  estimate  clonal 
patch  size,  epidermis  samples  of  2  mm  diameter  were  generally  clonal,  while 
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patches  of  4-5  mm  typically  displayed  a  mixed  XCI  status  (Chaturvedi  et  al.  2002). 
A  monoclonal  origin  of  individual  crypts  in  both  colon  and  small  intestine  was 
demonstrated  by  G6PD  staining  in  multiple  tissues  from  a  heterozygote  (Novelli 
et  al.  2003).  In  colon,  clusters  of  up  to  450  adjacent  crypts  showed  a  similar  staining 
pattern,  while  in  small  intestine  a  much  smaller  patch  size  was  observed,  with  the 
epithelial  lining  of  the  villi  sometimes  showing  mixed  staining  patterns.  Both  the 
myoepithelial  and  epithelial  cells  from  individual  breast  ducts  were  of  the  same 
monoclonal  in  origin  and  thus  breast  tumors,  which  tend  to  arise  from  individual 
ducts,  would  be  expected  to  show  skewed  XCI  even  if  having  a  multicellular  origin 
(Novelli  et  al.  2003;  Diallo  et  al.  2001).  While  thyroid  follicles  may  be  polyclonal 
origin  (Novelli  et  al.  2003),  monoclonal  patch  size  in  thyroid  tissue  was  estimated  at 
48-128  mm2  or  1-4  x  105  cells  (Jovanovic  et  al.  2003). 

Endometrial  tissue  shows  dramatic  regenerative  capacity  during  the  menstrual 
cycle  with  new  proliferation  deriving  from  stem  cells  within  the  endometrial 
glands.  Glands  located  within  an  ~1  mm  area  generally  showed  the  same 
inactive  X,  whereas  samples  spanning  >2  mm  showed  a  mixed  pattern  of  XCI 
(Tanaka  et  al.  2003).  Nonetheless,  when  paired  4-9  mm  samples  were  analyzed 
from  two  distinct  locations  in  endometrium,  myometrium  and  cervix  derived  from 
tissues  of  1 1  hysterectomies,  a  correlation  in  skewing  measurements  between  the 
paired  sites  was  observed  (Mutter  et  al.  1996).  The  distribution  of  skewing  was  used 
to  infer  that  each  of  these  tissues  derive  from  a  stem  cell  pool  of  10-12  cells  and  that 
one  sample  can  be  representative  of  the  average  skewing  in  the  tissue  as  a  whole. 

Determining  the  relationship  between  skewed  XCI  in  different  tissues  and  in 
different  samples  from  the  same  tissue  provides  a  molecular  assessment  of  the 
developmental  history  of  a  particular  tissue  that  can  then  be  used  to  understand  how 
genetic  and  epigenetic  variation  arises  in  development.  For  example,  the  mean 
levels  of  XCI  skewing  for  placental  amnion  and  chorion  are  correlated,  suggesting  a 
common  developmental  origin  from  inner  cell  mass  derivatives  subsequent  to  XCI, 
while  average  skewing  in  placental  trophoblast  was  uncorrelated  with  either 
amnion  or  chorion,  consistent  with  its  origin  from  the  trophectoderm  of  the  blasto- 
cyst (Penaherrera  et  al.  2012).  Furthermore,  the  patterns  of  XCI  skewing  in  tropho- 
blast were  consistent  with  a  monoclonal  origin  of  individual  chorionic  villus  trees 
and  large  patch  size.  In  contrast,  different  sites  of  amnion  taken  from  the  same 
placenta  showed  a  high  degree  of  correlation  consistent  with  a  high  degree  of 
intermixing  of  cells  and  little  "patchiness."  Overall,  skewing  of  XCI  can  provide 
important  information  on  the  developmental  origins  of  cell  populations,  but  care 
must  be  taken  to  ensure  that  the  normal  mosaic  patterns  of  clonality  in  females  are 
adequately  considered. 


3.2    De  Novo  Mutations  in  X-Linked  Disease 

For  counseling  of  families  with  X-linked  disease,  it  is  important  to  identify  non- 
manifesting  carriers.  Skewing  of  XCI  has  sometimes  been  used  to  identify  carriers 
within  a  family  (reviewed  in  Puck  and  Willard  1998);  however,  as  discussed  above 
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there  can  be  multiple  causes  for  skewed  XCI.  Furthermore,  new  mutations  arise  at  a 
substantial  frequency;  in  the  case  of  an  X-linked  disorder  severe  enough  to  reduce 
the  reproductive  fitness  of  males,  then  for  the  disease  frequency  to  remain  constant  in 
the  population  there  must  be  sufficient  new  mutations  to  "replace"  the  alleles  lost. 
While  in  theory  these  de  novo  mutations  could  be  of  maternal  origin,  the  high 
frequency  of  carrier  mothers  observed  in  Lesch-Nyhan  disease  led  to  the  suggestion 
that  the  de  novo  mutation  rate  was  higher  in  males  (Francke  et  al.  1976).  Genome 
sequencing  has  now  demonstrated  that  the  male  mutation  rate  is  elevated  throughout  the 
genome  (Kong  et  al.  2012).  Higher  paternal  mutation  rates  have  been  seen  for  hemo- 
philia A  (Leuer  et  al.  2001)  and  B  (Green  et  al.  1999),  adrenoleukodystrophy  (Wang 
et  al.  201 1),  and  X-linked  hypophosphatemic  rickets  (Durmaz  et  al.  2013).  The  excess 
of  paternal  X-linked  mutations  is  both  gene  and  mutation  type  specific.  For  example,  in 
DMD  and  Rett  syndrome  point  mutations  are  predominantly  paternal,  while  deletions 
or  insertions  can  be  maternal  in  origin  (Grimm  et  al.  1994;  Zhang  et  al.  201 1). 

The  presence  of  germ  line  or  somatic  mosaicism  has  important  counseling 
implications  that  depend  on  the  gene  and  mutation.  Somatic  mosaicism  has  been 
observed  in  8-12  %  of  mothers  of  children  with  X-linked  hemophilia  but  not 
adrenoleukodystrophy  (Wang  et  al.  2011).  Somatic  mutations  in  tumor  suppressor 
genes  can  contribute  to  the  development  of  cancer  and  generally  require  two  hits  to 
inactivate  both  alleles;  however,  as  the  X  is  functionally  hemizygous,  only  a  single 
hit  is  required.  There  are  a  limited  number  of  X-linked  genes  known  to  be  recur- 
rently mutated  in  cancer,  including  APC  membrane  recruitment  protein 
1  (AMER1),  ATRX,  FOXP3,  and  PHD  finger  protein  6  (PHF6),  the  latter  of 
which  has  been  reported  to  have  somatic  mutations  preferentially  on  the  paternal 
X  (Van  Vlierberghe  et  al.  2011).  In  general,  one  would  anticipate  that  XCI  would 
result  in  both  males  and  females  having  a  single  active  allele  subject  to  equivalent 
mutation  probability.  However,  for  those  genes  that  escape  XCI,  females  might 
show  lower  cancer  frequency.  With  improved  understanding  of  the  genes  that 
escape  XCI  such  sexual  dimorphisms  could  be  better  understood. 


3.3  Gaps  in  Dosage  Compensation 
3.3.1    Genes  That  Escape  from  XCI 

The  unique  evolutionary  history  of  the  sex  chromosomes  has  driven  the  need  for 
dosage  compensation;  but  this  compensation  is  incomplete,  with  approximately 
15  %  of  human  genes  escaping  from  XCI  (Carrel  and  Willard  2005).  The  eutherian 
X  and  Y  diverged  from  each  other  approximately  150-160  mYa  (reviewed  in 
Livernois  et  al.  2012).  Once  the  proto-Y  obtained  the  sex-determining  region  Y 
(SRY)  gene,  step-wise  decay  of  the  Y  ensued  whenever  recombination  with  the  X 
was  inhibited  by  genomic  rearrangements  such  as  inversions.  This  ratcheted  loss  of 
Y  homology  has  resulted  in  "evolutionary  strata"  on  the  X  (Lahn  and  Page  1999). 
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Fig.  3.4  Approximately  15  %  of  genes  escape  XCL  (a)  The  majority  of  genes  identified  to  escape 
from  XCl  are  located  in  regions  of  the  X  that  are  more  recently  diverged  from  the  Y.  (b)  Using 
cSNPs  to  examine  expression  in  females  with  nonrandom  XCl,  the  level  of  expression  from  the 
inactive  X  was  shown  to  be  variable,  but  averages  42  %  (Carrel  and  Willard  2005).  (c)  With 
somatic  cell  hybrids  the  number  of  escape  genes  was  found  to  be  lower,  and  for  a  substantial 
portion  of  the  genes,  expression  was  different  between  Xs  in  different  hybrids,  (d)  DNA  methyla- 
tion  differences  between  females  and  males  at  X-linked  CpG-island  promoters  can  be  used  to 
predict  genes  subject  to  XCl  (e.g.,  G6PD),  escaping  from  XCl  (e.g.,  L1CAM)  or  escaping  from 
XCl  in  only  some  tissues  (e.g.,  SAT).  S  =  subject  to  XCl,  E  =  escapes  XCl 


The  most  recent  stratum  is  the  pseudoautosomal  region  (PAR),  a  region  that  still 
pairs  and  undergoes  recombination  between  the  X  and  Y  during  male  meiosis.  The 
boundary  of  the  PAR  is  highly  divergent  between  eutheria,  and  the  human  PARI, 
which  contains  ~25  genes  in  2.7  Mb  of  DNA,  is  smaller  than  most  other  eutheria 
except  mouse  (Ross  et  al.  2005). 

Lyon  first  hypothesized  XCl  in  1961,  and  shortly  thereafter  suggested  that  genes 
in  the  PAR  would  not  require  dosage  compensation  (Lyon  1962).  Indeed,  all  PARI 
genes  examined  to  date  escape  XCl  (Carrel  and  Willard  2005).  Interestingly,  two  of 
the  four  PAR2  genes  examined  maintain  dosage  equivalence  by  silencing  both  the 
inactive  X-  and  Y-linked  genes  (D'Esposito  et  al.  1996;  De  Bonis  et  al.  2006).  As 
evolutionary  time  since  divergence  from  the  Y  homolog  increases,  so  does  the 
probability  that  genes  are  subject  to  XCl  (as  shown  in  Fig.  3.4a).  The  escapees 
outside  of  the  PARs  include  genes  with  functional  Y  homologues,  such  as  zinc 


3    X-Chromosome  Inactivation 


77 


finger  protein,  X-linked  (ZFX)  and  ribosomal  protein  S4,  X-linked  (RPS4X),  as  well 
as  genes  with  nonfunctional  Y  homologs,  such  as  Kallmann  syndrome  1  sequence 
(KALI),  or  without  any  apparent  remaining  Y  homology  such  as  carbonic 
anhydrase  VB,  mitochondrial  (CA5B). 

In  mammals,  the  degradation  of  the  Y  not  only  led  to  dosage  imbalance  between 
males  and  females  but  also  a  potential  halving  of  expression  levels.  While  there  has 
been  some  controversy,  it  appears  that  dosage  compensation  in  eutheria  involves 
both  up-regulation  of  the  active  X  in  males  and  females  and  XCI  to  achieve  dosage 
equivalency  between  the  sexes.  Comparisons  of  X  and  autosomal  gene  expression 
are  complicated  by  the  presence  of  considerably  more  tissue-specific  genes  on  the  X 
such  as  genes  involved  in  reproduction;  however,  with  the  exclusion  of  such  genes 
there  is  up-regulation  of  gene  expression  from  the  active  X  (see  Deng  et  al.  2011 
and  references  therein).  A  chromosomal  control  of  up-regulation  is  suggested  by 
the  absence  of  up-regulation  in  all  tissues,  as  dosage  compensation  is  maintained  in 
the  gametes  by  not  up-regulating  the  single  X  of  sperm  or  two  active  Xs  in  oocytes 
(Nguyen  and  Disteche  2006).  Dosage  compensation  is  likely  critical  only  for  a 
subset  of  genes,  in  particular  those  whose  products  are  involved  in  large  protein 
complexes  (Pessia  et  al.  2012). 


3.3.2    Methodology  for  Determining  XCI  Status 

It  is  becoming  apparent  that  silencing  of  a  gene  by  XCI  is  not  an  all  or  none 
epigenetic  phenomena,  but  rather  can  be  variable  between  individuals  and  tissues, 
as  well  as  being  nuanced  in  the  level  of  expression  from  the  inactive  X.  The 
sensitivity  to  detect  such  variabilities  depends  upon  the  approach  used  to  determine 
inactivation  status  of  a  gene. 

3.3.2.1    Expression  Analysis  in  Heterozygous  Females 

As  discussed,  heterozygous  carrier  females  can  show  mosaic  expression,  or  selec- 
tive skewing  of  XCI,  both  of  which  can  be  used  as  evidence  that  a  gene  is  subject  to 
XCI.  Assessment  of  total  RNA  levels  between  males,  females,  and  individuals  with 
X  aneuploidies  have  consistently  identified  some  genes  that  escape  XCI,  but  will 
also  identify  sex  or  aneuploidy-associated  differences.  To  determine  if  expression  is 
from  one  or  both  chromosomes,  one  can  use  RNA-FISH  or,  more  commonly, 
expression  of  a  heterozygous  polymorphism  in  a  female  with  skewed  XCI.  In 
such  clonal  populations,  biallelic  expression  reflects  expression  from  both  the 
active  and  the  inactive  X.  In  the  most  extensive  survey  of  biallelic  expression  to 
date,  Carrel  and  Willard  in  2005  evaluated  93  genes  in  clonal  cell  lines  (see 
Fig.  3.4b)  (Carrel  and  Willard  2005).  While  this  is  the  most  direct  assay  to  detect 
expression  from  the  inactive  X,  the  need  for  an  expressed  SNP  in  a  clonal  cell 
population,  or  single-cell  analysis,  reduces  the  informativity  of  this  approach. 
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Allelic  expression  imbalances  will  be  quantitated  by  RNA  sequencing  if  the  cell 
population  being  examined  is  skewed  for  XCI,  as  has  been  demonstrated  in  mouse 
using  cells  from  a  cross  between  Mus  musculus  and  Mus  spretus  to  identify  genes 
that  escape  XCI  (Yang  et  al.  2010). 


3.3.2.2    Expression  Analysis  in  Somatic  Cell  Hybrids 

The  human  inactive  X  can  be  isolated  from  the  human  active  X  in  mouse/human 
somatic  cell  hybrids,  allowing  direct  analysis  of  expression  of  human  genes  by 
human-specific  RT-PCR  (see  Fig.  3.4c).  Hybrids  have  been  shown  to  lose  localiza- 
tion of  the  XIST  RNA  that  is  essential  for  the  initiation  (Penny  et  al.  1996)  but  not 
maintenance  (Brown  and  Willard  1994)  of  XCI;  however,  in  the  Carrel  and  Willard 
survey  91  genes  were  examined  by  both  SNP  expression  and  hybrid  approaches  and 
the  inactivation  status  was  comparable  (Carrel  and  Willard  2005). 


3.3.2.3    Assessment  of  Epigenetic  Marks 

Indirect  assays  for  the  XCI  status  of  X-linked  genes  have  been  developed  that 
rely  upon  the  epigenetic  marks  that  are  associated  with  an  active  or  inactive  X. 
Analogous  to  the  direct  study  of  expression,  the  assessment  of  marks  associated 
with  gene  activity  needs  to  be  combined  with  allelic  differences  in  a  clonal 
population  of  cells  to  identify  whether  there  is  monoallelic  or  biallelic  presence 
of  the  mark.  A  number  of  X-linked  promoters  from  genes  escaping  XCI  were  shown 
to  have  biallelic  RNA  polymerase  II  association  (Kucera  et  al.  2011).  In  contrast, 
for  marks  that  are  associated  with  the  silent  allele,  such  as  promoter  DNA  methyla- 
tion,  the  presence  of  the  mark  can  be  considered  as  evidence  for  inactivation  of  the 
gene,  without  the  need  for  polymorphisms  and  clonal  cells  or  single-cell  analysis. 
Thus,  DNA  methylation  has  been  a  very  popular  approach  to  examine  XCI  status  of 
genes,  with  several  groups  having  reported  studies  in  a  variety  of  tissues  (Cotton 
et  al.  2011;  Sharp  et  al.  2011;  Yasukochi  et  al.  2010).  While  expression  or  active 
mark  studies  are  limited  to  genes  expressed  in  the  tissue  examined,  DNA  methyla- 
tion and  other  heterochromatic  marks,  such  as  H3K27me3  (Berletch  et  al.  2010), 
appear  to  be  present  even  when  the  gene  is  not  expressed  in  the  assayed  tissue. 
These  marks  provide  the  opportunity  to  study  tissue-specific  genes  not  assessed  in 
the  usual  surveys  that  examine  expression  in  fibroblasts  or  blood.  DNA  methyla- 
tion, however,  is  generally  only  correlated  with  inactivation  at  genes  with 
CpG-island  promoters  representing  approximately  60-70  %  of  X-linked  genes. 
Combining  results  from  analyzing  epigenetic  marks  with  expression  analyses  will 
allow  the  most  complete  assessment  of  inactivation  status  for  X-linked  genes. 
However,  the  compilation  of  a  catalog  of  inactivation  status  for  X-linked  genes  is 
confounded  by  the  variability  that  is  now  being  revealed  between  different  tissues 
and  different  individuals. 
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3.3.3    Variability  in  XCI 

Rather  than  a  dichotomous  "on"  or  "off,"  expression  from  the  inactive  X  shows 
variability  in  the  level  of  expression,  the  percent  of  individuals  in  which  the  gene  is 
expressed,  and  the  tissues  in  which  the  gene  escapes  inactivation.  Thresholds  have 
been  set  to  establish  XCI  status  in  surveys;  however,  these  variabilities  need  to  be 
considered  on  a  gene-by-gene  basis  in  the  clinic. 


3.3.3.1    Variability  in  Level  of  Expression  from  the  Inactive  X 

The  relatively  lower  expression  of  a  gene  from  the  inactive  X  relative  to  the  active 
X  was  a  feature  noted  early  in  the  study  of  genes  escaping  XCI  (Migeon  et  al.  1982), 
and  Carrel  and  Willard  (Carrel  and  Willard  2005)  established  a  "cutoff"  of  10  % 
expression  from  the  inactive  X  as  the  definition  for  "escape  from  XCI"  (Carrel  and 
Willard  2005)  (see  Fig.  3.4).  As  greater  depth  of  quantitative  data  is  acquired 
through  approaches  such  as  RNA-seq,  the  extent  of  this  variability  will  be  better 
quantified.  It  is  likely  that  expression  from  the  inactive  X  spans  the  spectrum  from 
silent  to  active  and  it  will  be  important  to  identify  the  biological  point  at  which 
expression  from  the  inactive  X  becomes  relevant,  because  of  increased  expression 
in  females,  ability  to  avoid  X-linked  disease  in  females  due  to  ongoing  expression 
from  the  non-mutated  X,  or  sensitivity  or  resilience  of  individuals  with  aneuplodies 
to  disease.  Depending  on  the  approach  used  to  identify  expression  from  the 
inactive  X,  low  level  expression  might  reflect  some  cells  expressing  fully  and 
others  silencing  completely.  In  the  cases  when  RNA-FISH  or  single-cell  PCR 
have  been  used  to  examine  expression,  it  appears  that  cells  have  reduced  expression 
from  the  inactive  X  (Carrel  and  Willard  1993);  however,  variability  is  also  observed 
in  which  genes  escape  XCI  in  different  tissues  (see  below). 


3.3.3.2    Variability  Between  Females  in  Escape  from  XCI 

A  survey  of  expression  in  a  panel  of  nine  inactive  X  containing  somatic  cell  hybrids 
showed  that  for  many  genes  expression  is  observed  in  a  subset  of  the  hybrids. 
A  cutoff  of  2/9  (22  %)  ((Carrel  and  Willard  2005);  see  Fig.  3.4c)  was  established  for 
the  designation  of  a  gene  as  subject  to  XCI,  seen  for  52  %  of  genes,  and  7/9  or  more 
expressing  hybrids  was  classified  as  "escape,"  seen  for  13  %  of  genes.  Such 
variability,  with  a  gene  escaping  inactivation  in  some,  but  not  all,  females  has 
also  been  demonstrated  using  expressed  SNPs  (Anderson  and  Brown  1999;  Carrel 
and  Willard  1993;  Kucera  et  al.  2011).  Prediction  of  escape  from  XCI  by  DNA 
methylation  also  supports  there  being  a  substantial  number  of  genes  which  escape 
XCI  in  some  females  (Cotton  et  al.  201 1).  This  variable  expression  does  not  appear 
to  be  regulated  at  the  level  of  the  chromosome,  as  females  who  express  one  variable 
gene  are  not  more  likely  to  express  another  one.  While  it  is  unknown  what  local 
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features  lead  to  variable  escape  of  XCI,  hypoacetylation  of  the  metallopeptidase 
inhibitor  1  (TIMP1)  gene  seemed  to  predispose  to  expression  from  the  inactive  X 
(Anderson  and  Brown  2005). 

3.3.3.3  Variability  Between  Tissues  in  Escape  from  XCI 

Tissue-specific  escape  from  XCI  was  reported  for  mouse  lysine  (K)-specific 
demethylase  5C  (Kdm5c)  (Carrel  et  al.  1996).  Using  DNA  methylation  levels  as  a 
means  of  identifying  genes  that  escape  XCI,  Cotton  et  al.  (2011)  predicted  the 
proportion  of  genes  escaping  XCI  to  range  from  9  %  in  blood  to  25  %  in  neural 
tissue,  although  the  sample  size  was  small  (Cotton  et  al.  2011).  Figure  3.4d  shows 
an  example  of  this  variability. 

3.3.3.4  Is  "Escape"  Reactivation? 

As  a  final  caveat,  when  a  gene  is  said  to  escape  XCI,  it  is  not  known  whether  it  fails 
to  respond  to  the  initial  inactivating  signal,  or  whether  it  tends  to  reactivate  in  a 
substantial  proportion  of  cells.  For  mouse  genes,  expression  can  be  monitored  early 
in  development  and  it  was  shown  for  one  gene  that  there  is  initial  silencing  followed 
by  reactivation  (Lingenfelter  et  al.  1998).  It  seems  likely  that  both  resistance  to 
inactivation  and  reactivation  contribute  to  this  phenomenon.  Distinguishing 
between  these  alternatives  will  be  important  not  only  for  understanding  the 
mechanisms  of  XCI,  but  to  assess  whether  reactivation  may  have  implications  for 
disease  predisposition  with  aging. 


3.3.4   Impact  of  Escapee  Genes  on  Phenotype 

Escape  from  XCI  will  result  in  higher  gene  expression  in  female  cells  relative  to 
male  cells  and  will  eliminate  the  mosaicism  in  female  carriers  to  the  extent  that  the 
two  Xs  are  equivalently  expressed.  A  growing  number  of  X-linked  disease  genes 
have  been  identified  to  escape  XCI  including  steroid  sulfatase  (microsomal) 
isozyme  S  (STS),  KALI  inhibitor  of  kappa  light  polypeptide  gene  enhancer  in 
B -cells  kinase  gamma  (1KB KG),  and  premature  ovarian  failure  IB  (POF1B),  and 
the  variabilities  discussed  above  may  contribute  to  the  variable  outcomes  seen  in 
carriers.  For  PAR  genes,  dosage  will  be  balanced,  and  the  inheritance  pattern 
distinct  from  both  X  and  autosomal  genes.  The  presence  of  a  Y  homolog  might 
compensate  for  other  escape  genes;  however,  it  was  shown  in  mouse  brain  that 
several  genes  with  Y  homologues  displayed  divergence  of  gene  regulatory 
sequences  between  the  X  and  Y  (Xu  et  al.  2008a,  b). 

Genes  escaping  XCI  have  a  pronounced  effect  on  phenotype  in  X-chromosomal 
aneuploidies.  The  majority  of  45 ,X  conceptuses  are  lost  prior  to  birth,  presumably  due 
to  the  deficiency  of  gene  products  normally  expressed  from  both  the  X  and  the  Y. 
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Gain  of  an  X  has  less  severe  phenotypic  consequences;  however,  reduced  lifespan 
and  some  recurrent  features  may  be  attributable  to  over-expression  of  genes 
escaping  XCI  (reviewed  in  Yang  et  al.  2011).  Additionally,  presence  of  an  addi- 
tional 5  %  of  the  genome  may  in  itself  be  detrimental  to  some  cellular  processes. 

Genes  that  escape  from  XCI  have  the  potential  to  contribute  to  a  sex  difference 
in  disease  susceptibility.  Indeed,  expression  of  genes  that  are  variable  in  escape 
between  females  could  contribute  to  inter-female  differences  as  well.  In  addition, 
genes  that  are  expressed  from  the  inactive  X  will  be  resistant  to  acquired  somatic 
mutations  in  females  which  could  be  protective  against  loss  of  tumor  suppressor 
gene  activity  resulting  in  a  male  predominance  to  some  cancers.  In  mice  a  clever 
breeding  strategy  has  utilized  an  Sry-deficient  Y  chromosome  and  Sry  transgenics 
to  generate  XX  and  XY  female  mice  and  XXSry  and  XY  male  mice  to  separate  the 
impact  of  the  sex  chromosomes  from  that  of  the  Sry  sex-determining  gene.  Such 
mice  have  been  used  to  demonstrate  a  role  for  the  XX  chromosome  complement 
in  various  disorders  including  autoimmune  disease  (Smith-Bouvier  et  al.  2008). 
Interestingly,  mice  appear  to  have  fewer  genes  that  escape  from  XCI  than  humans 
(Berletch  et  al.  201 1),  and  therefore  this  mouse  model  may  not  reflect  the  full  extent 
of  sex  differences  attributable  to  the  X  in  human  disease  situations.  Another 
difference  between  humans  and  mice  is  the  presence  of  imprinted  genes  on  the 
mouse  X,  which  have  not  yet  been  identified  in  humans  (Sharp  et  al.  2011).  As 
males  always  inherit  only  a  maternal  X,  imprinted  genes  have  the  potential  for 
considerable  differences  in  expression  between  males  and  females.  A  further 
consideration  for  the  impact  of  XCI  on  human  disease  is  that  because  the  inactive 
X  is  largely  facultative  heterochromatin,  it  can  act  either  as  a  sink  for  heterochro- 
matic  proteins,  or  a  tank,  representing  a  potential  storehouse  of  such  proteins 
(Blewitt  et  al.  2005;  Juriloff  and  Harris  2012).  With  the  emerging  role  of  chromatin 
remodelers  in  human  disease  and  cancer,  the  presence  of  an  inactive  X  may  thus 
have  epigenetic  impacts  that  have  not  yet  been  discovered.  Overall,  in  considering 
X-linked  disease  in  females,  the  parental  origin  of  new  mutations,  the  incomplete 
nature  of  dosage  compensation,  and  the  variability  in  XCI  skewing  between  and 
within  females  all  need  to  be  considered. 
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Chapter  4 

Cis-  and  Trans-Effects  Underlying  Polar 
Overdominance  at  the  Callipyge  Locus 

Michel  Georges,  Haruko  Takeda,  Huijun  Cheng,  Xu  Xuewen, 
Tracy  Hadfield-Shay,  Noelle  Cockett,  and  Carole  Charlier 


Abstract  The  callipyge  phenotype  is  a  muscular  hypertrophy  of  sheep  that  is 
characterized  by  a  unique  mode  of  inheritance,  referred  to  as  polar  overdominance, 
in  which  only  heterozygous  individuals  inheriting  the  CLPG  mutation  from  their 
sire  express  the  phenotype.  We  herein  report  recent  advances  towards  understand- 
ing the  molecular  mechanisms  underlying  polar  overdominance.  They  involve  an 
interplay  between  cis-  and  trans-effecis  of  the  CLPG  mutation  (Fig.  4.1). 
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Fig.  4.1  Working  model  for  polar  overdominance  at  the  ovine  callipyge  locus.  The  boxes 
illustrate  the  expression  profile  of  the  BEGAIN-DI03  domain  in  skeletal  muscle  according  to 
CLPG  genotype.  ^:  CLPG  mutation 


4.1    The  Callipyge  Phenotype 


"Callipyge"  (from  Greek  colli-',  beautiful  and pyge:  buttocks)  is  the  name  given  to  a 
heritable  muscular  hypertrophy  of  sheep  (Cockett  et  al.  1994).  It  was  first  observed 
in  a  farm  in  Oklahoma  in  the  1980s,  where  it  reportedly  affected  -10-15  %  of  the 
offspring  of  a  ram  named  "Solid  Gold."  When  slaughtered  at  equal  live  weight  as 
controls,  individual  muscle  of  callipyge  lambs  weight  on  average  38  %  more.  The 
hypertrophy  is  more  pronounced  for  the  muscles  of  the  pelvic  limb  (average:  42  %; 
range:  12-58  %)  and  torso  (average:  50  %;  range:  51-39  %)  than  for  the  thoracic 
limb  (average:  14  %;  range:  6-22  %).  The  muscular  hypertrophy  is  sufficiently 
pronounced  to  allow  experienced  observers  to  unambiguously  recognize  animals 
with  the  condition.  Callipyge  animals  do  not  grow  faster  than  controls,  but  are  more 
feed  efficient.  The  meat  of  callipyge  animals  is  leaner  yet  tougher  than  that  of 
controls,  and  this  has  hampered  their  widespread  commercial  use.  The  muscular 
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hypertrophy  was  shown  to  be  due  to  an  increase  in  the  proportion  and  diameter  of 
fast  twitch  glycolytic  muscle  fibers  (Koohmaraie  et  al.  1995;  Carpenter  et  al.  1996; 
Jackson  et  al.  1997a-c;  Freking  et  al.  1998a,  b,  1999;  Freking  and  Leymaster  2006). 

4.2  Polar  Overdominance 

When  mated  to  wild-type  ewes,  callipyge  rams  descendent  of  Solid  Gold  produced 
~50  %  callipyge  offspring,  irrespective  of  gender.  In  these  crosses,  the  callipyge 
phenotype  thus  appeared  to  be  autosomal  dominant.  A  linkage  scan  conducted 
under  this  model  indeed  identified  a  locus  on  distal  chromosome  18  that  fully 
accounted  for  the  segregation  of  the  trait  in  these  pedigrees.  The  locus  and 
corresponding  mutation  were  labeled  CLPG  (Cockett  et  al.  1994).  Surprisingly, 
the  reciprocal  mating  (namely  between  callipyge  ewes  and  wild-type  rams)  pro- 
duced only  wild-type  offspring,  despite  transmission  of  the  CLPG  haplotype  to 
halve  of  the  offspring,  as  expected.  This  nonequivalence  of  reciprocal  crosses 
suggested  that  the  CLPG  locus  might  be  subject  to  parental  imprinting  with 
epigenetic  silencing  of  the  madumnal  (i.e.,  transmitted  by  the  mother)  allele.  In 
agreement  with  the  imprinting  hypothesis,  wild- type  sons  having  received  the 
CLPG  haplotype  from  their  callipyge  dam  (genotype  CLPGMat l+Pat)  produced 
50  %  callipyge  offspring  when  mated  to  unrelated  wild- type  ewes  (genotype  +/+) 
(Cockett  et  al.  1996). 

Parental  imprinting,  however,  did  not  account  for  the  observation  that  only 
~25  %  of  offspring  from  crosses  between  callipyge  sires  and  dams  (genotype: 
+Mat ICLPGPat)  expressed  the  phenotype.  Assuming  a  silenced  madumnal  CLPG 
allele,  +Mat/CLPGPat  as  well  as  CLPG/CLPG  offspring  were  expected  to  be 
callipyge.  Yet,  if  +Mat/CLPGPat  offspring  were  indeed  found  to  be  callipyge  in 
these  matings,  their  CLPG/CLPG  (i.e.,  homozygous  for  the  CLPG  haplotype)  sibs 
were  not.  We  have  called  this  unusual  non-Mendelian  inheritance  pattern  (in  which 
only  heterozygous  individuals  inheriting  the  mutation  from  a  sex- specified  parent 
have  a  superior  phenotype)  "polar  overdominance"  (Cockett  et  al.  1996).  It  was 
subsequently  confirmed  by  others  (f.i.  Freking  et  al.  1998a). 

The  polar  overdominance  model  predicted  that  matings  between  wild-type 
CLPG/CLPG  rams  and  wild-type  +/+  ewes  should  yield  100  %  callipyge  offspring 
and  this  was  indeed  found  to  be  the  case  (Cockett  et  al.  1996). 

4.3  The  BEGAIN-DI03  Imprinted  Domain 

To  decipher  the  molecular  mechanism  underlying  polar  overdominance,  the  CLPG 
mutation  was  first  fine-mapped  to  a  ~4  Mb  and  then  to  a  ~400  kb  chromosome 
interval  on  distal  chromosome  18  (Fahrenkrug  et  al.  2000;  Shay  et  al.  2001; 
Berghmans  et  al.  2001).  The  interval  was  shown  to  encompass  a  novel  imprinted 
domain  that  was  just  being  discovered  (f.i.  da  Rocha  et  al.  2008). 
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The  BEGAIN-DI03  domain  is  now  known  to  span  ~1  Mb  in  eutherians  and  to 
encompass  eight  genes  subject  to  parental  imprinting  in  placental  mammals  but  not 
in  marsupials  and  monotremes  (Charlier  et  al.  2001a;  Paulsen  et  al.  2001;  Edwards 
et  al.  2008).  Strikingly,  the  four  genes  that  are  preferentially  expressed  from  the 
padumnal  (i.e.,  transmitted  by  the  father)  allele  are  protein-coding  genes  (BEGAIN, 
DLK1,  RTL1  (also  known  as  PEGU),  and  DI03),  while  the  four  that  are  preferen- 
tially expressed  from  the  madumnal  allele  are  long,  noncoding  RNA  genes 
(IncRNAs)  (GTL2  (also  known  as  MEG3),  RTL1-AS  (also  known  as  PEGU -AS), 
RIAN  (also  known  as  MEG8),  and  MIRG  (also  known  as  MEG9)). 

BEGAIN  {brain- enriched  guanylate  kinase-associated  protein)  was  originally 
isolated  from  a  rat  brain  cDNA  library  and  shown  to  encode  a  protein  that  binds  to 
the  guanylate  kinase  domain  of  PSD95/SAP90,  a  scaffolding  protein  at  the  post- 
synaptic cell  membrane  (Deguchi  et  al.  1998).  We  have  shown  in  sheep  that 
BEGAIN  is  actually  very  broadly  expressed,  easily  detectable  in  multiple  tissues 
both  pre-  and  postnatally.  Alternative  usage  of  two  promoters  (A  and  B)  and 
splicing  generates  multiple  isoforms.  Transcripts  initiated  at  the  B  promoter  exhibit 
clear  imprinting  with  predominant  expression  from  the  padumnal  allele  in  most 
tissues  (but  not  liver).  Transcripts  initiated  at  the  A  promoter  are  biallelically 
expressed  in  most  tissues  with,  however,  predominance  of  the  padumnal  allele  in 
kidney  (Smit  et  al.  2005).  Tierling  et  al.  (2009)  subsequently  showed  that  BEGAIN 
is  likewise  imprinted  in  the  mouse  in  a  tissue-  and  promoter-specific  manner  and 
controlled  by  the  IG-DMR  located  upstream  of  GTL2  (see  hereafter). 

DLK1  (delta-like  homologue  1;  also  known  as  preadipocyte  factor- 1  (PREFI)  or 
fetal  antigen  (FA1))  encodes  a  so-called  noncanonical  Notch  ligand  characterized 
by  six  EGF-like  domains,  a  trans-membrane  domain  and  a  short  intracellular  tail.  It 
misses  the  N-terminal  Delta-Serrate-LAG-2  (DSL)  domain,  involved  in  receptor 
binding,  shared  by  all  canonical  Notch  ligands.  Full-length  DLK1  possesses  a 
juxtamembrane  TACE(ADAM17)-mediated  cleavage  site,  which — upon  proteoly- 
sis— releases  a  soluble  form  of  DLK1.  Alternative  splice- variants  skip  this  site, 
encoding  membrane-bound  DLK1  isoforms  (f.i.  Falix  et  al.  2012).  DLK1  is 
expressed  in  a  wide  range  of  tissues  (including  placenta,  liver,  adipose,  skeletal 
muscle,  lung,  vertebrae,  pituitary  and  adrenal  glands)  before  birth,  while  its  expres- 
sion in  adults  is  limited  to  the  pituitary  and  adrenal  glands,  pancreas,  central 
nervous  system,  testes,  prostate,  and  ovaries.  DLK1  has  been  shown  to  play 
essential  roles  in  adipogenesis,  hematopoiesis,  neurogenesis,  development  of  skel- 
etal muscle,  liver,  lung,  pancreas,  and  pituitary  gland  and  adaptation  to  independent 
life  (f.i.  Charalambous  et  al.  2012;  Falix  et  al.  2012;  Mirshekar-Syahkal  et  al.  2013; 
Waddell  et  al.  2010).  DLK1  appears  up-regulated  in  specific  tumors  including 
neuroblastoma,  hepatoblastoma,  and  Wilms  tumors  (f.i.  Falix  et  al.  2012).  DLK1 
is  thought  to  directly  interact  thereby  inhibiting  Notch  receptors  (f.i.  Bray 
et  al.  2008).  Soluble  DLK1  secreted  by  niche  astrocytes  was  recently  shown  to 
stimulate  proliferation  of  neural  stem  cells  of  the  subventricular  zone  expressing 
membrane-bound  DLK1,  suggesting  signaling  by  direct  interaction  between  the 
two  isoforms  (Ferron  et  al.  2011). 
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RTL1  (Retroposon  like  7)  is  one  of  nine  mammalian  genes  derived  from  ancestral 
sushi-ichi  LTR  elements  of  the  Ty3/gypsy  family  that  were  most  likely  acquired  by 
horizontal  gene  transfer  (f.i.  Youngson  et  al.  2005).  RTL1  has  lost  its  LTR  sequences, 
yet  has  maintained  a  highly  conserved,  intron-less  ~4  kb  open  reading  frame 
homologous  to  the  gag  and  pol  genes  of  the  retroelement  from  which  it  derives, 
strongly  suggesting  eutherian- specific  "exaptation."  Knockout  and  overexpression 
experiments  indicate  that  RTL1  plays  an  essential  role  in  the  development  of  the 
placenta,  an  eutherian- specific  organ  (Sekita  et  al.  2008).  RTL1  has  been  found  to  be 
heavily  methylated  on  both  alleles  in  all  examined  tissues. 

DI03  encodes  the  type  3  deiodinase  (D3)  that  inactivates  both  T3  and  T4  by 
5-deiodination  of  the  inner  ring  and  acts  locally  to  reduce  TH  availability  (Tsai 
et  al.  2002).  Placental  expression  of  DI03  protects  the  fetus  against  thyrotoxicosis 
due  to  high  levels  of  thyroid  hormone  in  the  maternal  circulation. 

The  IncRNA  GTL2  (Gene  trap  locus  2)  encompasses  at  least  12  exons  and 
generates  a  panoply  of  alternatively  spliced  transcripts  (Schuster-Gossler 
et  al.  1998).  Surprisingly,  although  the  exon-intron  structure  of  GTL2  appears  to 
be  conserved  amongst  eutherians,  the  exonic  sequences  per  se  are  not  more 
conserved  than  the  intronic  sequences,  raising  questions  about  the  nature  of  its 
function  (f.i.  Charlier  et  al.  2001a).  GTL2  has  recently  been  shown  to  directly  bind 
the  polycomb  repressive  complex  2  (PRC2),  suggesting  that  it  may  guide  PRC2  to 
impose  repressive  chromatin  marks  at  specific  chromosomal  target  regions.  GTL2 
knock-down  in  embryonic  stem  cells  causes  a  twofold  increase  in  DLK1  transcript 
levels  and  reduction  in  H3K27  trimethylation  at  the  DLK1  promoter,  suggesting 
that  one  of  the  functions  of  GTL2  might  be  to  ds-regulate  the  madumnal  DLK1 
allele  by  blocking  its  transcription  (Zhao  et  al.  2010). 

As  its  name  implies,  RTL1-AS  is  a  lnc  RNA  that  is  antisense  to  the  intron-less 
RTL1  gene.  RTL1-AS  forms  six  hairpin  loops  that  are  recognized  by  the  DROSHA- 
DGCR8  microprocessor  complex  to  generate  pre-miRNAs  that  will  be  further 
processed  to  generate  a  quadrille  of  miRNAs  (Seitz  et  al.  2003;  Davis 
et  al.  2005).  Being  perfectly  complementary  to  RTL1  over  their  entire  length,  this 
set  of  miRNAs  has  indeed  been  shown  to  mediate  slicing  of  RTL1  (assumed  to  be 
mediated  by  AG02),  a  mode-of-action  of  miRNA  which,  although  common  in 
plants,  is  exceptional  in  animals  (Davis  et  al.  2005).  The  strand- specific  acquisition 
of  miRNA  precursors  in  a  maternally  expressed  transcript  that  is  antisense  to  a 
paternally  expressed  gene  encoding  a  protein  that  is  essential  for  placental  devel- 
opment is  a  remarkable  feat,  thought  to  be  driven  by  the  evolutionary  forces 
underlying  the  kinship  theory  of  parental  imprinting  (Haig  2000). 

The  most  striking  distinctive  feature  of  the  RIAN  IncRNA  gene  is  the  fact  that  it 
encodes  more  than  40  small  RNAs  resembling  small  nucleolar  RNAs  (snoRNAs)  of 
the  C/D  type  falling  in  two  main  clusters  (SNORD113  and  SNORD114)  (Cavaille 
et  al.  2002).  The  primary  known  function  of  C/D  snoRNAs  is  the  2/-0-methylation 
of  rRNAs,  small  nuclear  RNAs,  and  tRNAs.  However,  none  of  the  C/D  snoRNAs 
from  the  BEGAIN-DI03  domain  exhibits  significant  sequence  complementarity 
to  any  one  of  these  putative  targets.  Also,  the  C/D  snoRNAs  do  not  correspond 
to  peaks  of  evolutionary  conservation  in  the  RIAN  gene  (Caiment  et  al.  2010). 
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Twelve  of  the  C/D  snoRNAs  have  been  shown  to  be  precursors  of  miRNAs  in  sheep 
(Caiment  et  al.  2010)  and  eight  to  be  precursors  of  Piwi-interacting  RNAs 
(piRNAs)  expressed  in  testis  (Girard  et  al.  2006).  C/D  snoRNAs  from  the 
BEGAIN-DI03  domain  have  been  shown  to  be  overexpressed  in  some  patients 
with  acute  myeloid  leukemia  (AML),  and  their  in  vitro  overexpression  to  induce 
cell  proliferation  (reviewed  in  Girardot  et  al.  2012). 

MIRG  is  the  pri-miRNA  of  a  cluster  of  -50  miRNAs  (Seitz  et  al.  2004;  Glazov 
et  al.  2008;  Caiment  et  al.  2010);  reviewed  in  Girardot  et  al.  2012).  The  positions  of 
the  MIRG  miRNA  coincide  with  peaks  of  evolutionary  conservation  testifying  for 
an  essential,  sequence-dependent  function.  A  small  subset  of  the  MIRG  miRNA 
undergoes  A-to-I  editing  affecting  the  seed,  which  might  modulate  the  target 
spectrum  of  the  corresponding  miRNAs  (Kawahara  et  al.  2007;  Caiment 
et  al.  2010).  The  targets  of  the  MIRG  miRNA  remains  essentially  unknown,  yet 
seed-centered  target  prediction  reveals  an  enrichment  of  regulators  of  the  gene 
circuitry  operating  at  the  transcriptional,  translational,  and  posttranslational  level, 
primarily  in  the  nervous  system  (f.i.  Caiment  et  al.  2010).  Experimental  evidence 
supports  a  role  in  the  regulation  of  synaptic  development  and  function.  MIRG 
miRNA  are  found  overexpressed  in  several  cancers,  and  several  of  them  have 
been  linked  to  stem  cells  and  pluripotency  (reviewed  in  Girardot  et  al.  2012). 

Monoallelic,  imprinted  expression  of  all  genes  in  the  BEGAIN-DI03  domain  is 
controlled  by  an  intergenic  germ  line  differentially  methylated  region  (IGDMR):  an 
-8  kb  sequence  associated  with  a  CpG  island  located  -15  kb  upstream  of  GTL2 
(Takada  et  al.  2002).  The  IGDMR  is  one  of  three  imprinting  control  regions  (ICR) 
that  acquire  methylation  in  the  male  (rather  than  female)  germ  line  (in  addition  to 
the  ICR  of  the  Igf2-H19  and  Rasgrfl  loci),  while  remaining  unmethylated  on 
maternal  transmission  (Ferguson- Smith  2011).  Deletion  of  the  BEGAIN-DI03 
IGDMR  on  the  madumnal  chromosome  causes  it  to  behave  as  a  padumnal  chromo- 
some in  the  embryo:  expression  of  the  protein-encoding  BEGAIN,  DLK1,  RTL1, 
and  DI03  genes,  and  silencing  of  the  noncoding  GTL2,  RTL1AS,  RIAN,  and  MIRG 
genes.  Deletion  of  the  IGDMR  on  the  padumnal  chromosome  does  not  affect  its 
behavior  (Lin  et  al.  2003).  Monoallelic  expression  is  accompanied  by  somatic 
methylation  of  secondary  DMRs  in  at  least  the  DLK1  and  GTL2  genes  (Takada 
et  al.  2002;  reviewed  in  da  Rocha  et  al.  2008).  It  is  noteworthy  that  the  effect  of 
madumnal  IGDMR  deletion  differs  in  placenta,  causing  expression  of  the  protein- 
encoding  genes  (normally  only  expressed  from  the  padumnal  allele)  without  silenc- 
ing of  the  noncoding  RNA  genes  (Lin  et  al.  2007). 


4.4    The  CLPG  Mutation 

A  reference  sequence  for  most  of  the  ovine  CLPG  domain  was  first  generated  by 
sequencing  five  BAC  clones  that  jointly  covered  -570  kb  ranging  from  -15  kb 
upstream  of  BEGAIN  to  -80  kb  downstream  of  MIRG  (Charlier  et  al.  2001a;  Smit 
et  al.  2005).  Two  teams  independently  used  a  large-scale  PCR-based  strategy  to 
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sequence  the  domain  for  a  chromosome  known  to  carry  the  CLPG  mutation.  The 
two  teams  applied  a  similar  strategy  to  select  the  animals  to  resequence:  large 
numbers  of  callipyge  animals,  known  to  be  of  +  at/CLPGPat  genotype,  were 
screened  in  order  to  identify  individuals  that  would  be  autozygous  for  the  CLPG 
marker  haplotype.  Resequencing  such  individuals  would  simultaneously  provide 
sequence  information  of  the  haplotype  carrying  the  CLPG  mutation  as  well  as  of  a 
closely  related  (identical-by-descent)  wild- type  haplotype.  The  prediction  was  that 
both  haplotypes  would  differ  at  a  very  small  number  of  sites,  including  the  CLPG 
mutation.  As  a  matter  of  fact,  both  efforts  identified  the  same  unique  variant 
differentiating  the  CLPG  and  +  allele:  an  A  to  G  transition  located  in  the  ~90  kb 
DLK1-GTL2  intergenic  region,  32.7  kb  upstream  of  the  GTL2  transcription  start 
site.  The  A  to  G  substitution  was  shown  to  affect  the  third  position  of  a  dodecamer 
motif  that  is  virtually  perfectly  conserved  amongst  eutherians.  In  sheep,  the  G  allele 
was  exclusively  observed  amongst  descendants  of  Solid  Gold.  Intriguingly,  the 
available  elephant  sequence  has  the  same  G  residue  at  the  corresponding  position 
(Freking  et  al.  2002;  Smit  et  al.  2003). 

Solid  Gold,  the  alleged  founder  of  the  callipyge  flock,  was  genotyped  for  the 
CLPG  mutation  using  DNA  extracted  from  leucocytes.  He  was  found  to  be  hetero- 
zygous A/G.  Yet,  the  allelic  ratio  departed  markedly  from  the  expected  1:1,  being 
closer  to  0.8(A):0.2(G).  Genotyping  a  panel  of  microsatellites  excluded 
leucochimerism,  leaving  mosaicism  as  most  likely  explanation.  It  indicated  that 
the  corresponding  A  to  G  transition  occurred  during  early  development  of  Solid 
Gold,  proving  beyond  any  reasonable  doubt  that  it  is  indeed  the  CLPG  mutation. 
The  fact  that  only  15  %  of  Solid  Gold's  offspring  were  callipyge  is  probably 
reflecting  the  fact  that  he  was  germ  line  mosaic  as  well  (Smit  et  al.  2003). 


4.5    C/s-Effects  of  the  CLPG  Mutation 

The  strong  sequence  conservation  of  the  dodecamer  motif  suggested  that  it  might 
act  as  ds-acting  regulatory  element  that  might  be  perturbed  by  the  CLPG  mutation. 
Indeed,  it  was  observed  that  on  paternal  transmission  of  the  CLPG  mutation,  DLK1 
and  RTL1  transcript  levels  remained  elevated  in  postnatal  skeletal  muscle  at  ages 
where  these  genes  are  normally  nearly  completely  silenced,  and  that  all 
corresponding  transcripts  originated  from  the  padumnal  allele.  Accordingly,  on 
maternal  transmission  of  the  CLPG  mutation,  levels  of  GTL2,  RTL1-AS,  RIAN, 
MIRG,  and  associated  sno-  and  miRNAs  remained  high  in  the  same  tissue  and  at  the 
same  developmental  stage,  all  corresponding  transcripts  originating  from 
the  madumnal  allele.  Transcript  levels  of  BEGAIN  and  DI03  were  unaffected  by 
the  CLPG  mutation  whether  transmitted  paternally  or  maternally.  Thus,  the  CLPG 
mutation  appeared  to  inactivate  a  ds-acting  silencer  element  controlling  the  tran- 
scription levels  of  a  core  cluster  of  genes  from  the  BEGAIN-DI03  domain  in 
postnatal  skeletal  muscle,  without  perturbing  their  imprinting  status  (Charlier 
et  al.  2001b;  Bidwell  et  al.  2001,  2004;  Smit  et  al.  2005;  Murphy  et  al.  2005; 
Perkins  et  al.  2006;  White  et  al.  2008;  Caiment  et  al.  2010). 
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That  the  corresponding  dodecamer  motif  indeed  corresponds  to  a  cis-acting 
regulatory  element  is  further  supported  by  the  fact  that  it  maps  to  a  histone-code 
defined  "strong  enhancer"  signature  (Ernst  et  al.  2011)  and  that  it  binds  polll  and 
JunD  albeit  in  erythroleukemic  K562  cells  (Gerstein  et  al.  2012).  However,  how 
this  regulatory  element  operates  in  skeletal  muscle  remains  largely  unknown.  When 
subjecting  ~100  bp  centered  around  the  mutation  to  a  Matinspector  analysis,  the 
first  striking  observation  is  that  both  wild-type  and  mutant  sequence  contain 
E-boxes  recognized  by  myogenic  regulatory  bHLH-containing  factors  (MRFs), 
which  might  be  related  to  the  muscle-specificity  of  the  regulatory  element. 
However,  EMSA  experiments  did  not  reveal  a  differential  affinity  of  wild-type 
and  CLPG  oligonucleotides  for  MyoD  (Freking  et  al.  2002).  Intriguingly, 
Matinspector  analysis  suggests  that  the  CLPG  mutation  abrogates  a  binding  site 
for  the  Pokemon/ZBTB7A  transcriptional  repressor,  which  might  be  compatible 
with  the  observed  loss  of  silencer  function. 

Whichever  the  intervening  ^ra^-acting  partners  are,  the  effect  of  the  CLPG 
mutation  is  accompanied  by  at  least  three  epigenetic  c/s-alterations:  (1)  the  majority 
of  CpG  sites  in  the  immediate  vicinity  of  the  CLPG  mutation  are  refractory  to 
methylation  imposed  in  postnatal  skeletal  muscle  on  the  wild-type  allele  (Murphy 
et  al.  2006;  Takeda  et  al.  2006),  (2)  the  CLPG  mutation  uncovers  novel  DNase-I 
hypersensitive  sites  (DHS)  in  postnatal  skeletal  muscle  of  which  one  colocalizes 
nearly  exactly  with  the  mutation  site  (Takeda  et  al.  2006),  and  (3)  the  CLPG 
mutation  enhances  bidirectional  intergenic  transcription  across  the  entire  DLK1- 
GTL2  interval  (Takeda  et  al.  2006). 

Hoping  to  facilitate  the  study  of  the  CLPG  phenomenon,  we  established  cultures 
of  myoblasts  derived  from  lambs  of  the  four  possible  CLPG  genotypes.  However, 
the  ds-effect  of  the  mutation  on  the  transcript  levels  of  the  neighboring  genes, 
which  was  so  striking  in  vivo,  was  lost  in  the  cultured  myoblasts. 


4.6    Trans-Effects  of  the  CLPG  Mutation 


Uncovering  the  cis-effect  of  the  CLPG  mutation  revealed  at  least  part  of  its  modus 
operandi  yet  did  not  explain  polar  overdominance.  The  transcript  profile  of 
callipyge  animals  suggested  that  ectopic  expression  of  either  DLK1  and/or  RTL1 
caused  the  phenotype,  however  CLPG/CLPG  animals  shared  this  feature  with  their 
+MatICLPGPat  sibs,  while  being  phenotypically  wild-type. 

The  veil  was  in  part  lifted  when  the  expression  of  DLK1  was  studied  at  the 
protein  rather  than  RNA  level.  Indeed,  despite  comparable  levels  of  DLK1 
transcripts  in  +Mat/CLPGPat  and  CLPG/CLPG  animals,  DLK1  protein  could  only 
be  detected  in  skeletal  muscle  of  +Mat/CLPGPat  ones.  There  was  thus  a  perfect 
correlation  between  the  presence  of  DLK1  protein  in  skeletal  muscle  and  pheno- 
type: present  in  callipyge  +  at/CLPGPat  animals,  absent  in  wild-type  +/+, 
CLPGMat/+Pa\  and  CLPG/CLPG  animals  (Davis  et  al.  2004).  Note  that  it  was 
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subsequently  reported  that  low  levels  of  DLK1  protein  could  be  detected  in  skeletal 
muscle  of  CLPG/CLPG  animals  (White  et  al.  2008). 

It  thus  appeared  that  the  DLK1  and  RTL1  mRNA  transcribed  from  the  padumnal 
CLPG  allele  were  posttranscriptionally  down-regulated  in  CLPG/CLPG  but  not  in 
+Mat iQu>QPat  individuals.  The  noncoding  RNAs  transcribed  from  the  madumnal 
CLPG  allele  in  CLPG/CLPG  but  not  in  +Mat/CLPGPat  animals  were  obviously  the 
best  candidate  mediators  of  this  Mat-to-Pat  trans  effect.  Amongst  these,  the  numer- 
ous miRNAs  processed  from  RTL1-AS  and  MIRG  in  particular,  stood  out  as 
the  prime  suspects  given  the  known  function  of  miRNA  in  posttranscriptional 
inhibition  of  target  mRNAs  (Georges  et  al.  2003). 

Because  of  their  perfect  sequence  complementarity  with  the  intron-less  RTL1 
transcripts,  the  miRNA  processed  from  RTL1-AS  were  excellent  candidate  trans- 
inhibitors  of  RTL1 .  Accordingly,  we  readily  identified  RTL1  degradation  products 
terminating  at  the  exact  positions  predicted  for  AG02-mediated  "slicing"  (i.e.,  the 
residue  facing  nucleotide  position  11  of  the  corresponding  miRNA)  in  skeletal 
muscle  of  CLPG/CLPG  animals  (Davis  et  al.  2005).  Thus,  in  CLPG/CLPG  animals, 
miRNAs  processed  from  the  RTL1  -AS  pri-miRNA  transcribed  from  the  madumnal 
allele  indeed  trans-inhibit  RTL1  transcripts  originating  from  the  padumnal  allele, 
the  very  mechanism  predicted  to  underlie  polar  overdominance.  The  observed 
RNAi  also  explained  why  RTL1  transcripts,  although  more  abundant  in  CLPG/ 
CLPG  and  +Mat/CLPGPat  animals  when  compared  to  the  two  other  genotypes,  were 
nevertheless  less  abundant  in  CLPG/CLPG  than  in  +Mat/CLPGPat  animals  (Charlier 
et  al.  2001b). 

Evidence  for  the  same  AG02-mediated  slicing  of  RTL1  transcripts  mediated  by 
miRNAs  derived  from  RTL1-AS  was  also  obtained  in  placenta  of  wild-type  mice, 
which  is  the  more  likely  physiological  location  of  this  ^ra^-interaction  (Davis 
et  al.  2005).  It  remains  one  of  the  very  few  examples  of  miRNA-dependent  slicing 
in  animals. 

The  absence  of  DLK1  protein  in  CLPG/CLPG  animals  pointed  to  the  occurrence 
of  a  similar  Mat-to-Pat  drafts-inhibition  of  DLK1  transcripts  in  these  animals.  Is  this 
drafts-effect  also  mediated  by  miRNAs  from  the  CLPG  domain?  No  miRNAs  from 
the  BEGAIN-DI03  domain  appeared  to  be  perfectly  complementary  to  DLK1 
transcripts  over  their  entire  length,  as  observed  for  RTL1.  However,  it  is  well 
established  that,  in  the  vast  majority  of  cases,  animal  miRNAs  recognize  their 
targets  via  fuzzy,  partial  sequence  complementarity  that  tends  to  be  nucleated  by 
the  miRNA  "seed"  (corresponding  to  residues  2-8  of  the  miRNA),  and  primarily 
occurs  in  the  3'UTR.  This  leads  to  down-regulation  of  the  target  by  a  combination 
of  splicing-independent  mRNA  degradation  and  translational  inhibition.  The 
observed  -threefold  reduction  in  DLK1  transcript  levels  and  > tenfold  reduction 
in  DLK1  protein  levels  in  CLPG/CLPG  animals  when  compared  to  +  at/CLPGPat 
animals,  was — at  first  glance — highly  reminiscent  of  the  initial  report  of  the  mode- 
of-action  of  the  paradigmatic  lin-4  animal  miRNA  on  its  lin-28  target  in  C.  elegans 
(Seggerson  et  al.  2002),  hence  supporting  miRNA-dependent  down-regulation  of 
DLK1  in  CLPG/CLPG  animals. 
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To  identify  the  miRNA  from  the  BEGAIN-DI03  domain  that  might  mediated  the 
^ra^-inhibition  of  DLK1  in  CLPG/CLPG  animals,  we  first  used  next  generation 
sequencing  to  generate  an  exhaustive  catalogue  of  miRNAs  expressed  in 
longissimus  dor  si  of  8-week-old  sheep  of  the  four  possible  CLPG  genotypes.  We 
identified  a  total  of  747  miRNA  "species."  By  species,  we  refer  to  a  population  of 
"isomirs"  (including  edited  variants)  derived  from  the  same  arm  of  a  hairpin 
precursor.  Hundred  and  ten  of  these  species  mapped  to  74  hairpin  loops  residing 
within  the  BEGAIN-DI03  domain  (fifty  within  MIRG  and  five  within  RTL1-AS). 
We  then  evaluated  whether  any  one  of  these  miRNAs  exhibited  an  unusual  affinity 
for  DLK1  and  might  therefore  be  responsible  for  the  observed  Mat-to-Pat  trans- 
effect  We  first  evaluated  the  affinity  of  miRNAs  for  DLK1  using  bioinformatic 
predictions.  None  of  the  miRNAs  could  be  convincingly  incriminated  using  this 
approach.  At  best,  the  whole  set  of  miRNAs  from  the  BEGAIN-DI03  domain  was 
showing  a  marginally  significant  affinity  for  the  coding  region  of  the  DLK1 
transcripts  when  considered  jointly  ("as  a  quadrille")  (Caiment  et  al.  2010).  As 
sensitivity  and  specificity  of  bioinformatic s  predictions  of  miRNA-target 
interactions  are  reputedly  poor,  we  backed  the  bioinformatic  predictions  up  with 
a  reporter  assay  (in  cell  culture)  designed  to  interrogate  the  affinity  of  all  miRNA 
from  the  domain  for  the  full-length  ovine  DLK1  transcripts.  Again,  not  a  single 
miRNAs  emerged  from  this  assay  as  being  single-handedly  capable  of  down- 
regulating  DLK1  to  an  extent  comparable  with  that  observed  in  vivo  (Chen 
et  al.  unpublished).  We  finally  performed  RNA  immunoprecipitation  (RIP) 
experiments  using  anti-AG02  antibodies  under  conditions  that  allow  coimmuno- 
precipitation  of  miRNA  targets.  DLK1  coimmunopreciptation  was  not  convincingly 
increased  in  skeletal  muscle  of  CLPG/CLPG  animals  when  compared  to  +Mat/ 
CLPGPat  animals  (Takeda  et  al.  unpublished).  Taken  together  and  against 
expectations,  these  results  did  not  support  a  direct  role  of  miRNAs  from  the 
BEGAIN-DI03  domain  in  mediating  the  Mat-to-Pat  drafts-inhibition  of  DLK1 
observed  in  CLPG/CLPG  animals. 

These  findings  obviously  raise  the  question  as  to  what  the  actual  molecular 
mechanism  might  be?  Are  any  of  the  maternally  expressed  long  noncoding  RNA 
genes  (such  as  GTL2)  directly  involved?  Is  the  effect  miRNA-dependent  but 
indirect  (i.e.,  the  miRNA  down-regulate  a  trans-acting  factor  that  is  required  for 
DLK1  translation)?  Further  work  is  obviously  required  to  elucidate  this  still  open 
question. 

In  addition  to  these  blatant  Mat-to-Pat  drafts-effects  on  RTL1  and  DLK1 ,  detailed 
examination  of  transcript  and  DNA  methylation  levels  revealed  two,  more  subtle 
drafts-effects  of  the  CLPG  mutation.  The  first  corresponds  to  the  consistently 
observed  higher  expression  levels  of  all  maternally  expressed  noncoding  RNA 
genes  in  +  at/CLPGPat  when  compared  to  +/+  and  in  CLPG/CLPG  when  compared 
to  CLPGMat/+Pat  animals.  Thus  the  presence  of  the  CLPG  mutation  on  the 
padumnal  chromosome,  results — one  way  or  the  other — in  increased  expression 
levels  of  the  noncoding  RNA  genes  transcribed  from  the  madumnal  chromosomes 
(f.i.  Charlier  et  al.  2001b).  The  molecular  mechanisms  underlying  this  observation 
remain  unknown,  but  an  attractive  hypothesis  is  transvection,  i.e.,  the  somewhat 
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profane  suggestion  that  ds-acting  regulatory  elements  may  actually  sometimes 
operate  in  trans  (f.i.  Kennisson  and  Southworth  2002). This  hypothesis  can  now 
be  tested  using  Chromosome  Conformation  Capture,  combined  with  allelic 
discrimination. 

The  second  subtle  trans-zffzcl  is  the  observation  that,  in  +MatICLPGPat  and 
CLPGMatl+Pat  heterozygous  individuals,  the  DNA  methylation  status  of  the 
padumnal  chromosome  (in  the  immediate  vicinity  of  the  CLPG  mutation),  although 
primarily  determined  by  CLPG  genotype,  is  modified  by  the  genotype  of  the 
madumnal  chromosome.  Thus,  the  padumnal  CLPG  allele  is  more  methylated  in 
+MatICLPGPat  than  in  CLPG/CLPG  animals,  while  the  padumnal  +  allele  is  less 
methylated  in  CLPGMat l+Pat  animals  than  in  +/+  animals  (Takeda  et  al.  2006).  The 
significance  and  mechanisms  underlying  this  observation  remain  completely 
unknown. 


4.7    From  CLPG  Genotype  to  Callipyge  Phenotype 

The  previous  sections  highlight  the  considerable  progress  that  has  been  achieved  in 
deciphering  the  complex  effects  of  the  CLPG  mutation  on  the  expression  profiles  of 
a  core  cluster  of  flanking  genes  in  animals  of  the  four  possible  CLPG  genotypes. 
But  how  are  these  changes  in  gene  expression  leading  to  the  callipyge  muscular 
hypertrophy  of  +Mat/CLPGPa  animals? 

One  of  the  most  striking  characteristics  distinguishing  +  at/CLPGPat  animals 
from  the  other  three  genotypes  is  the  ectopic  expression  of  DLK1  protein  confined 
to  hypertrophied  muscle.  To  test  whether  this  could  be  the  primary  determinant  of 
the  hypertrophy,  we  generated  transgenic  mice  expressing  the  membrane-bound  C 
form  of  ovine  DLK1  (i.e.,  the  isoform  found  to  be  ectopically  expressed  in  callipyge 
animals)  under  the  dependence  of  the  myosin  light  chain  3F  promoter  and  2E 
enhancer.  We  obtained  two  lines  with  spatiotemporal  expression  of  ovine  DLK1 
highly  reminiscent  of  that  observed  on  +Mat/CLPGPat  sheep  (i.e.,  ectopic  expression 
in  postnatal  skeletal  muscle).  In  both  lines,  adult  (11  or  25  weeks  old)  transgenic 
animals  exhibited  a  highly  significant  muscular  hypertrophy  resulting  from  a  ~10  % 
increase  in  myofiber  diameter.  Taken  together,  we  considered  that  these  results 
were  providing  strong  support  for  a  direct  role  of  postnatal  DLK1  overexpression  in 
mediating  the  muscular  hypertrophy  of  callipyge  sheep  (Davis  et  al.  2004).  It  is 
noteworthy  that  an  increase  in  DLK1  expression  respecting  the  spatiotemporal 
expression  pattern  of  the  endogenous  locus  (hence  prenatal  but  not  postnatal 
overexpression  in  skeletal  muscle)  appears  not  to  affect  the  development  of  skeletal 
muscle  (da  Rocha  et  al.  2009),  yet  that  conditional,  muscle-specific  knockout  of 
DLK1  causes  a  reduction  in  muscle  mass  (Waddell  et  al.  2010). 

+MatICLPGPat  callipyge  sheep  also  differentiate  themselves  most  likely  from  the 
three  other  genotypes  by  the  ectopic  expression  of  RTL1  (Byrne  et  al.  2010).  Does 
RTL1  also  contribute  to  the  callipyge  phenotype?  We  have  produced  transgenic 
lines  expressing  ovine  RTL1  under  the  dependence  of  the  same  myosin  light  chain 
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3F  promoter  and  2E  enhancer.  Preliminary  results  suggest  that  ectopic  expression 
of  RTL1  enhances  organismal  (i.e.,  multi-organ)  growth,  including  that  of  muscle. 
Contrary  to  callipyge  sheep,  however,  the  relative  mass  of  muscle  does  not  seem  to 
be  affected.  Double  transgenic  mice  (DLK1  and  RTL1)  are  presently  being 
generated  to  test  for  a  possible  synergistic  effect  between  these  two  genes.  It 
remains  possible,  therefore,  that  RTL1  participates  with  DLK1  in  causing  the 
callipyge  phenotype. 

How  does  ectopic  expression  of  DLK1  and  possible  RTL1  cause  the  observed 
muscular  hypertrophy,  i.e.,  what  are  the  downstream  effectors  targeted  by  DLK1 
and/or  RTL11  Three  studies  have  used  bovine  microarrays  to  compare  the 
transcriptome  of  affected  muscle  (f.i.  longissimus  dorsi  and  semimembranosus) 
between  callipyge  (+Mat/CLPGPat)  and  wild-type  (+/+)  animals  (Fleming-Waddell 
et  al.  2007,  2009;  Vuocolo  et  al.  2007).  Amongst  the  genes  whose  expression  was 
most  significantly  affected,  figured — as  expected — a  number  of  myosin  heavy 
chain  genes,  reflecting  the  shift  towards  glycolytic  type  II  fibers  in  hypertrophied 
muscle.  The  expression  of  the  same  set  of  genes  was  predicted  to  be  altered  when 
comparing  semimembranosus  and  longissimus  dorsi  (predominantly  fast  oxidative) 
with  semitendinosus  (predominantly  fast  glycolytic)  in  wild-type  (+/+)  animals, 
and  this  was  used  in  one  of  the  studies  to  identify  a  set  of  differentially  expressed 
genes  specific  for  the  SD  (+Mat/CLPGPat)  versus  SD  (+/+)  contrast.  Across  the  three 
studies,  more  than  300  genes  were  identified  whose  expression  might  be  altered  in 
hypertrophied  muscles  of  callipyge  animals.  Slightly  more  than  halve  of  these 
would  be  overexpressed  in  callipyge  muscle.  Analysis  of  the  gene  lists  didn't 
clearly  reveal  the  predominant  role  of  specific  pathways,  yet  the  authors  highlighted 
a  possible  involvement  of  histone  modifying  enzymes,  and  of  the  AKT/mTOR 
signaling  pathway.  It  is  noteworthy  that  the  skeletal  muscle  transcriptome  of 
QU>GMatl+Pat  animals  (which  are  ectopically  expressing  the  maternally  expressed 
noncoding  RNAs)  was  virtually  not  altered  when  compared  to  that  of  +/+  animals. 
The  secondary  events  leading  to  the  callipyge  muscular  hypertrophy  therefore 
remain  largely  unknown. 


4.8    Polar  Over  dominance:  Sheep  Idiosyncrasy 
or  Common  Phenomenon? 

There  is  at  least  one  other  mammalian  phenotype  that  is  inherited  in  a  manner 
reminiscent  of  polar  overdominance:  the  DDK  syndrome  (recognized  as  early  as 
1967).  The  main  feature  of  the  DDK  syndrome  is  the  nearly  fully  penetrant  early 
lethality  of  embryos  derived  from  crosses  between  females  from  the  DDK  strain 
with  males  from  most  non-DDK  strains,  while  the  reciprocal  cross  and  the  respec- 
tive within  strain  matings  are  fully  fertile.  It  is  now  known  that  the  embryonic 
lethality  is  due  to  the  incompatibility  between  a  cytoplasmic  factor  contributed  by 
the  DDK  ooplasm  (possibly  an  RNA)  and  a  nuclear  factor  contributed  by  the 
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non-DDK  sperm  cell.  The  two  factors  are  encoded  by  a  pair  of  closely  linked  genes 
mapping  to  the  Om  {Ovum  mutant)  locus  on  mouse  chromosome  1 1 .  The  parent-of- 
origin  effect  characterizing  the  DDK  syndrome  therefore  results  from  the  involve- 
ment of  genes  expressed  in  either  the  male  or  female  germ  line  rather  than  from  the 
involvement  of  imprinted  genes  (reviewed  in  Wakasugi  (2007)). 

Recently,  Wolf  et  al.  (2008)  scanned  the  mouse  genome  for  quantitative  trait 
loci  (QTL)  influencing  weight  and  growth  in  a  purpose-build  three-generation 
design  allowing  distinction  of  alternative  heterozygotes  and  hence  testing  for 
parent-of-origin  effects  independent  of  maternal  effects.  They  detected  multiple 
supposedly  imprinted  QTL  (iQTL),  including  several  characterized  by  polar 
overdominance  reminiscent  of  the  callipyge  phenomenon,  as  well  as  iQTL  with 
polar  underdominance  as  well  as  bipolar  dominance  (the  two  heterozygous 
genotypes  differ  from  each  other,  while  the  homozygous  genotypes  do  not). 
These  results  suggest  that  polar  overdominance  and  related  inheritance  patterns 
might  be  more  common  than  generally  appreciated. 

Kong  et  al.  (2009)  revisited  seven  risk  loci  identified  by  GWAS  for  common 
complex  diseases  in  human,  and  mapping  in  the  vicinity  of  known  imprinted  gene 
clusters,  accounting  for  the  parental  origin  of  an  individual's  haplotypes,  which 
allowed  them  to  test  for  imprinting  effects.  Significant  parent-of-origin  effects  were 
observed  for  five  of  the  seven  examined  loci.  With  exception  of  one  of  these  loci,  it 
is  somewhat  unclear  whether  the  authors  specifically  tested  for  polar 
overdominance-like  effects,  and  it  therefore  remains  difficult  to  conclude  from 
this  study  in  how  far  such  modes  of  inheritance  might  contribute  to  inherited  risk 
for  common  complex  diseases. 

One  study  examined  association  between  common  variants  in  the  DLK1 -GTL2 
region  and  childhood  obesity  using  trio  data  allowing  distinction  between  the  two 
classes  of  heterozygotes,  and  therefore  testing  of  parent-of-origin  effects  including 
polar  overdominance  (Wermter  et  al.  2008).  Intriguingly,  modeling  parent-of- 
origin  effects  revealed  an  association  that  was  overlooked  when  using  a  standard 
Mendelian  model,  and  polar  overdominance  appeared  to  better  explain  the  data 
than  a  simple  imprinting  model.  However,  the  effect  was  strongest  for  rs  18027 10,  a 
synonymous  variant  in  the  DLK1  ORF,  which  remains  difficult  to  reconcile  with  the 
complex  regulatory  effects  observed  for  the  ovine  CLPG  mutation.  Follow-up 
studies  are  needed  to  confirm  and  extend  these  intriguing  observations. 

There  have  been  at  least  two  reports  claiming  the  detection  in  line-crosses  of 
QTL  affecting  either  growth,  fatness,  or  muscle  mass,  with  polar  overdominance- 
like  effects  in  the  vicinity  of  the  porcine  BEGAIN-DI03  region  (Kim  et  al.  2004; 
Boysen  et  al.  2010).  It  is  important  to  note,  however,  that  the  experimental  designs 
and  analyses  methods  used  in  both  studies  are  now  known  to  be  prone  to  detection 
of  false-positive  imprinting  effects.  This  is  primarily  due  to  the  fact  that  the 
analyses  methods  assume  that  alternative  QTL  alleles  (Q  and  q)  are  fixed  in  the 
parental  lines,  and  that  all  Fl  parents  are  therefore  assumed  to  be  of  Qq  QTL 
genotype.  It  was  shown  that  such  models  are  very  often  not  accurate  and  cause 
so-called  pseudo-imprinting  (f.i.  Sandor  and  Georges  2008).  Hence,  further  work  is 
needed  to  evaluate  the  widespread  importance  of  polar  overdominance  in  livestock. 
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4.9  Conclusion 

Analysis  of  the  callipyge  phenotype  and  its  unusual  polar  overdominant  inheritance 
has  revealed  unique  molecular  mechanism  including,  in  particular,  the  miRNA- 
mediated  interaction  between  the  maternal  and  paternal  homologues  at  the 
imprinted  BEGAIN-DI03  domain.  Much  remains  to  be  learned,  however,  about 
the  ultimate  causes  of  polar  overdominance:  what  are  the  trans -acting  factors 
mediating  the  ds-effect  of  the  CLPG  mutation  and  how  do  they  exert  their  effect? 
What  are  the  mediators  of  the  Mat-to-Pat  ^ra^-inhibition  of  DLK1 ,  the  most  likely 
effector  of  the  callipyge  muscular  hypertrophy?  What  are  the  secondary 
mechanisms  by  which  ectopic  expression  of  the  callipyge  effectors  exert  their 
effect?  What  explains  the  rostro-caudal  phenotypic  gradient?  Moreover,  a  more 
detailed  understanding  of  the  callipyge  phenomenology  may  contribute  to  a  better 
understanding  of  the  BEGAIN-DI03  domain  and  in  particular  of  the  role  of  the 
lincRNAs  expressed  from  the  madumnal  chromosome:  what  are  the  roles  of  GTL2, 
RTL1-AS,  RIAN,  MIRG,  and  the  C/D  snoRNAs  and  miRNAs  derived  from  them? 
What  are  the  different  ds-elements  regulating  the  expression  of  the  genes  in  the 
BEGAIN-DI03  domain  and  how  do  they  interact?  Where  does  the  Mat-to-Pat  trans- 
interaction,  revealed  by  studying  the  callipyge  phenotype,  occur  in  wild-type 
animals  and  what  is  its  biological  significance? 
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Chapter  5 

Transgenerational  Epigenetic  Effects 
and  Complex  Inheritance  Patterns 

Anna  K.  Naumova 


Abstract  Mendel's  laws  describing  the  inheritance  of  simple  genetic  traits  are 
based  on  the  assumption  that  genes  are  transmitted  unchanged  from  parents  to 
offspring.  However,  this  assumption  is  not  always  correct  as  the  mammalian 
genome  undergoes  massive  epigenetic  reprogramming  during  gametogenesis  and 
early  embryonic  development.  Furthermore,  stochastic  or  environmentally  induced 
epigenetic  variation  may  lead  to  situations  where  the  epigenetic  marking  of  the 
same  allele  differs  between  parents  and  offspring,  among  siblings  or  even  monozy- 
gotic twins.  Epigenetic  marks  may  modify  the  penetrance  of  a  phenotype  and  cause 
"non-Mendelian"  inheritance  even  when  the  causal  genetic  variant  is  transmitted 
from  parent  to  offspring  in  perfect  Mendelian  proportions.  In  this  chapter  we  focus 
on  a  particular  type  of  inheritance  where  epigenetic  marks  that  are  present  in 
parental  somatic  cells  fail  to  be  reset  in  a  proportion  of  parental  germ  cells  and 
are  transmitted  to  offspring.  Such  a  transgenerational  "epigenetic  memory" 
increases  the  complexity  of  the  inheritance  pattern  and  therefore  is  of  particular 
interest  for  geneticists. 
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5.1    Epigenetic  Remodeling  of  the  Genome 
in  Germ  Cells  and  Early  Embryos 

Current  understanding  of  the  multistep  epigenetic  remodeling  of  the  genome  that 
takes  place  in  the  mammalian  germ  line  is  based  largely  on  genome-wide  analyses 
of  DNA  methylation  levels  and  detailed  studies  of  imprinted  genes  (for  detailed 
description  of  epigenetic  reprogramming  in  the  mammalian  germ  line  see  Chap.  1). 

Primordial  germ  cells  are  the  earliest  precursors  of  an  individual's  gametes. 
They  originate  from  the  somatic  cells  of  primary  ectoderm  of  the  embryo  and 
migrate  to  the  future  gonad  (genital  ridge).  In  mice,  they  populate  the  genital  ridge 
between  days  11  and  13  (midgestation)  of  embryonic  development  (Ginsburg 
et  al.  1990;  Gomperts  et  al.  1994).  In  humans,  they  appear  at  the  genital  ridge 
between  4  and  6  weeks  post  conception,  i.e.,  during  the  first  trimester  of  pregnancy 
(Clark  2007).  After  migration  of  primordial  germ  cells  to  the  genital  ridge,  a  wave 
of  genome- wide  DNA  demethylation  erases  the  somatic  cell- specific  DNA  methyl- 
ation marks  from  their  genomes  (Szabo  and  Mann  1995;  Hajkova  et  al.  2002;  Feng 
et  al.  2010)  (Fig.  5.1).  This  prepares  the  primordial  germ  cell  genome  for  the 
acquisition  of  gamete-specific  epigenetic  marks,  which  depend  on  the  sex  of  the 
embryo.  In  mammals,  male  and  female  gametogenesis  differ  in  many  aspects 
including  the  developmental  timing,  length  of  certain  stages,  gene  expression 
profiles,  RNA  storage,  and  protein  accumulation  as  well  as  the  stages  when  the 
genomic  DNA  becomes  methylated.  In  males,  a  massive  gain  of  genomic  DNA 
methylation  occurs  before  or  at  the  spermatogonial  stage,  preceding  the  onset  of 
meiosis  (Davis  et  al.  1999, 2000;  Kaneda  et  al.  2004;  Trasler  2009).  Spermatogonia, 
as  well  as  cells  at  later  stages  of  spermatogenesis,  are  constantly  renewed  and 
produced  during  a  male's  life.  In  females,  the  onset  of  prophase  I  of  meiosis  takes 
place  before  birth.  Oocytes  reach  the  diplotene  stage  of  meiotic  prophase  and  are 
stored  at  this  stage  for  months  (in  mice)  or  decades  (in  humans  and  other  mammals 
with  a  longer  reproductive  life  span).  After  puberty,  every  estrus  cycle  several 
oocytes  are  recruited  into  the  growing  oocyte  pool.  They  grow  and  complete  the 
first  meiotic  division,  whereas  the  second  meiotic  division  occurs  only  after  fertili- 
zation (reviewed  in  Amleh  et  al.  2012).  During  oocyte  growth  and  maturation 
genome-wide  acquisition  of  DNA  methylation  occurs  (Obata  and  Kono  2002; 
Lucifero  et  al.  2004;  Hiura  et  al.  2006;  Smallwood  et  al.  2011). 

The  sex-specific  epigenetic  marks  that  are  established  during  gametogenesis 
result  in  genome-wide  epigenetic  differences  between  the  mature  spermatozoan 
and  oocyte  genomes  and  consequently  genome-wide  differences  between  the  mater- 
nally and  paternally  derived  chromosomes  in  the  embryo  (Fig.  5.1)  (for  details  see 
Chap.  1).  After  fertilization,  the  first  steps  of  embryonic  development  are  marked  by 
genome-wide  loss  of  DNA  methylation  (Howlett  and  Reik  1991;  Monk  1995)  and 
chromatin  reorganization  (reviewed  in  Feng  et  al.  2010).  The  consequence  of  this 
wave  of  DNA  demethylation  is  loss  of  parent-of-origin-dependent  DNA  methylation 
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Fig.  5.1  The  cycle  of  epigenetic  reprogramming.  DNA  methylation  and  demethylation  in  the 
mammalian  germ  line  and  embryo 


at  most  genomic  loci.  After  global  erasure  of  the  vast  majority  of  gametic  DNA 
methylation  marks,  a  new  wave  of  DNA  methylation  occurs  during  and  after  the 
blastocyst  stage  (Monk  1995).  From  this  point  on,  cell  type-specific  DNA  methyla- 
tion patterns  that  are  essential  for  the  maintenance  of  tissue-specific  gene  expres- 
sion are  established  (Feng  et  al.  2010). 

Thus  in  mammals,  there  are  two  waves  of  genome- wide  DNA  demethylation 
that  occur  during  the  time  window  which  is  particularly  important  for  genetic 
transmission,  in  primordial  germ  cells  and  in  early  preimplantation  embryos;  and 
at  least  one  wave  of  DNA  methylation  that  occurs  during  gametogenesis.  Based  on 
the  cycle  of  DNA  demethylation  and  remethylation,  one  may  predict  that  anomalies 
in  methylation  reprogramming  could  cause  dramatic  changes  in  the  epigenetic 
marking  of  the  gametic  or  embryonic  genomes,  and  thereby  influence  inheritance 
of  genetic  traits.  It  follows  therefore  that  genetic  or  environmental  factors  that  cause 
subtle  shifts  in  the  global  or  locus-specific  DNA  methylation  levels  specifically  in 
the  germ  cells  of  the  developing  embryo  will  not  affect  the  individual's  somatic 
cells:  their  effects  will  come  to  light  only  in  the  next  generation. 
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5.2    Transgenerational  Epigenetic  Inheritance. 
Transgenes  and  Retroviral  Elements 

The  definition  of  transgenerational  epigenetic  inheritance  is  often  a  matter  of 
debate  (Chong  et  al.  2007a,  b)  and  several  excellent  reviews  have  been  recently 
published  on  the  subject  (Daxinger  and  Whitelaw  2012;  Grossniklaus  et  al.  2013; 
Lim  and  Brunet  2013).  In  this  chapter,  the  term  "transgenerational  epigenetic 
inheritance"  refers  to  inheritance  of  the  epigenetic  marks  of  the  somatic  cells  of 
the  parent  by  his/her  offspring,  through  the  germ  line  (Morgan  et  al.  1999;  Chong 
et  al.  2007a).  Emerging  evidence  implicates  specific  classes  of  microRNAs  in 
transgenerational  epigenetic  effects  (Nelson  and  Nadeau  2010;  Ashe  et  al.  2012; 
Daxinger  and  Whitelaw  2012;  Grentzinger  et  al.  2012).  However,  in  this  chapter 
we  focus  on  the  currently  best  understood  epigenetic  mark — DNA  methylation. 
For  transgenerational  epigenetic  inheritance  to  occur,  the  somatic  DNA  methyla- 
tion marks  must  resist  the  two  waves  of  demethylation,  in  the  primordial  germ  cells 
and  in  preimplantation  embryos,  as  well  as  the  waves  of  remethylation  during 
gametogenesis  and  embryonic  development  (Fig.  5.1).  In  plants,  this  is  a  common 
occurrence  (Mathieu  et  al.  2007;  Feng  et  al.  2010).  In  mammals,  very  few  bona  fide 
cases  of  transgenerational  epigenetic  inheritance  have  been  documented  in  detail. 
Nevertheless  the  phenomenon  may  be  of  greater  importance  than  previously 
appreciated  as  several  recent  genome-wide  studies  of  DNA  methylation  in  mouse 
gametes  and  embryos  demonstrated  that  certain  classes  of  DNA  sequences  includ- 
ing endogenous  retroviruses  resist  genome- wide  demethylation  in  primordial  germ 
cells  (Hackett  et  al.  2013)  and  that  methylation  levels  of  CpG  islands  in  preim- 
plantation embryos  depend  upon  their  methylation  levels  in  oocytes  (Smallwood 
et  al.  2011;  Guibert  et  al.  2012).  These  recent  data  demonstrate  that  epigenetic 
reprogramming  is  not  equally  efficient  at  all  genomic  regions/germ  cells,  providing 
direct  evidence  for  the  hypothesis  that  was  first  proposed  a  couple  of  decades  ago 
based  on  non-Mendelian  inheritance  patterns  at  several  genomic  loci  (Laird  1987; 
Follette  and  Laird  1992;  Naumova  and  Sapienza  1994;  Naumova  et  al.  2001). 

Apart  from  its  role  in  cell  differentiation  and  the  orchestration  of  genome- wide 
gene  regulation,  DNA  methylation  acts  as  a  host-response  mechanism  protecting 
the  integrity  of  cellular  genomes  from  foreign  DNA  invasion  (reviewed  in  Bestor 
1998;  Matzke  and  Matzke  1998).  Mammalian  genomes  harbor  thousands  of  retro- 
viral transposable  elements,  also  termed  endogenous  retroviruses.  Their  expression 
is  critical  for  transposition  and  spreading  throughout  the  genome  (reviewed  in 
Hancks  and  Kazazian  2012),  whereas  methylation  of  endogenous  retroviruses 
prevents  their  expression  and  thereby  maintains  genome  integrity  (Bourc'his  and 
Bestor  2004;  Maksakova  et  al.  2008).  This  host  defense  mechanism  also  targets 
transgenic  sequences  that  are  inserted  into  the  mouse  genome  for  research  purposes, 
thereby  often  hampering  transgenic  research  (Reik  et  al.  1987;  Sapienza  et  al.  1987; 
McGowan  et  al.  1989).  The  phenomenon  of  transgene  methylation  was  investigated  in 
detail  in  several  transgenic  mouse  lines  (Reik  et  al.  1987;  Swain  et  al.  1987;  McGowan 
et  al.  1989;  Allen  et  al.  1990;  Pickard  et  al.  2001;  Valenza-Schaerly  et  al.  2001). 
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These  studies  demonstrated  that  transgene  methylation  may  depend  upon  the  parent  of 
origin  (Reik  et  al.  1987;  McGowan  et  al.  1989)  or  the  mouse  strain  (Weng  et  al.  1995; 
Engler  et  al.  1998).  Moreover,  in  certain  transgenic  lines  the  transgenic  DNA  sequence 
may  accumulate  DNA  methylation  over  generations  (Allen  et  al.  1990),  becoming 
increasingly  methylated  with  every  germ  line  transmission.  This  indicates  that  the 
transgenic  sequence  resisted  the  waves  of  demethylation  in  primordial  germ  cells 
and/or  early  embryos  (Kearns  et  al.  2000),  but  attracted  methylation  during  gameto- 
genesis  and  perhaps  at  later  stages  of  embryonic  development.  A  series  of  in-depth 
studies  of  methylation  of  retroviral  elements  revealed  a  similar  epigenetic  response  to 
endogenous  retroviruses  (reviewed  in  Rakyan  et  al.  2002). 

The  intracysternal  A  particle  (IAP)  is  a  mobile  DNA  element  that  contains  a  full 
retroviral  genome  flanked  by  two  long  terminal  repeats  (LTR)  (Kuff  and  Lueders 
1988).  Insertion  of  an  IAP  near  or  within  a  gene  may  cause  changes  in  gene 
regulation  because  the  IAP  harbors  promoters  that  may  overtake  regulation  of 
expression  of  neighboring  genes.  Methylation  of  the  IAP  promoter  prevents  its 
adverse  effect  on  neighboring  genes.  Two  animal  models  on  which  most  of  our 
current  understanding  of  transgenerational  epigenetic  inheritance  is  based  are 
(1)  the  insertion  of  an  IAP  upstream  of  the  agouti  locus  (Perry  et  al.  1994)  giving 
rise  to  the  agouti  viable  yellow  allele  (Avy)  (Dickies  1962)  and  (2)  the  insertion  of  an 
IAP  into  intron  6  of  the  Axin  1  gene  (Vasicek  et  al.  1997)  giving  rise  to  the  Axin 
fused  allele  (Axirfu)  (Reed  1937).  In  the  case  of  the  dominant  Avy  allele,  the  IAP 
LTR  drives  the  expression  of  the  agouti  transcript  bypassing  tissue-specific  regu- 
latory elements  and  causing  expression  of  the  agouti  gene  in  cells  and  at  stages 
where  normally  the  gene  is  not  expressed  (Morgan  et  al.  1999).  The  agouti  gene 
product  is  involved  in  melanocortin  receptor  binding.  Loss  of  proper  cell-specific 
regulation  of  the  agouti  locus  has  a  pleiotropic  effect  on  multiple  tissues  (Wolff 
1965)  including  the  hair  follicles  where  it  results  in  an  easily  scored  phenotype — a 
yellow  coat  color.  In  heterozygous  Avy  mice  DNA  methylation  levels  of  the  IAP  LTR 
vary  between  individual  mice  and  even  between  individual  cells  (Morgan 
et  al.  1999).  Mice  with  non-methylated  IAP  have  a  yellow  coat  and  develop  obesity. 
Hypermethylation  of  the  IAP  restores  normal  regulation  of  the  agouti  gene  and 
produces  a  pseudo -wild-type  phenotype  with  brownish  colored  fur  (termed 
pseudoagouti)  (Morgan  et  al.  1999).  Variation  in  IAP  methylation  among  individual 
cells  results  in  mosaic  expression  of  the  agouti  gene  and  therefore  in  a  mosaic,  also 
termed  mottled,  coat  color  (Fig.  5.2).  Thus,  a  single  mutation  causes  a  range  of  coat 
color  phenotypes.  In  every  generation,  about  20  %  of  the  offspring  of  heterozygous 
male  carriers  of  the  Avy  allele  are  pseudoagouti,  independent  of  the  color  of  the 
father,  i.e.,  independent  of  the  methylation  level  of  the  allele  in  the  parental  somatic 
cells  (Wolff  1978;  Morgan  et  al.  1999)  (Fig.  5.2a).  However,  the  IAP  methylation 
level  in  the  mother  and  hence  maternal  coat  color  provide  a  good  prediction  for  the 
distribution  of  the  coat  color  among  her  offspring:  pseudoagouti  mothers  produce 
more  pseudoagouti  pups  compared  to  yellow  mothers  (Morgan  et  al.  1999) 
(Fig.  5.2b-d).  Moreover,  pups  born  to  a  pedigree  with  a  pseudoagouti  mother  and 
maternal  grandmother,  have  the  highest  chance  of  wearing  pseudoagouti  coats 
(Morgan  et  al.  1999)  (Fig.  5. 2d).  Such  a  correlation  between  the  methylation  levels 
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Most  cells  contain  un  methylated  IAP,  yellow  coat  color 
Mosaic  with  predominance  of  cells  with  urimethylated  IAP,  mottled  coat  color 
Mosaic  with  equal  proportions  of  cells  with  un methylated  and  methylated  IAP.  mottled  coat  color 
Most  cells  contain  methylated  IAP,  pseud o agouti  coat  color 


Fig.  5.2  Stochastic  epigenetic  variation  or  transgenerational  epigenetic  inheritance.  The  diagram 
depicts  the  inheritance  of  the  coat  color  phenotypes  in  mouse  carriers  of  the  agouti  viable  Avy 
allele,  (a)  Stochastic  variation  in  methylation  in  the  offspring  does  not  depend  upon  the  allelic 
methylation  of  the  IAP  LTR  in  the  father.  Paternal  methylation  marks  are  fully  erased  after 
fertilization,  but  their  establishment  in  the  embryo  is  variable  and  subject  to  stochastic  influences, 
(b-d)  Transgenerational  epigenetic  inheritance  of  allelic  DNA  methylation  and  resulting 
phenotypes.  The  individual's  coat  color  depends  on  methylation  of  the  IAP  LTR  in  the  mother. 
DNA  methylation  is  not  fully  erased  in  the  germ  line  of  the  mother  and  influences  the  LTR  DNA 
methylation  level  in  the  cells  of  her  offspring;  therefore  mothers  with  higher  methylation  levels 
have  a  higher  chance  of  producing  offspring  with  high  methylation  levels  and  vice  versa 


of  the  IAP  in  the  somatic  cells  of  the  mother  and  the  distribution  of  methylation 
levels  among  her  offspring  results  from  failure  to  reset  the  somatic  DNA  methyla- 
tion marks  at  the  IAP  in  the  Avy  allele  during  gametogenesis  and  preimplantation 
development  (Rakyan  et  al.  2003;  Blewitt  et  al.  2006).  Thus,  the  reprogramming 
defect  adds  another  layer  to  the  inheritance  of  the  genotype — inheritance  of  the 
epigenotype  or  transgenerational  epigenetic  inheritance.  It  is  important  to  note  that 
in  this  model,  the  fact  that  inbred  mice  have  virtually  identical  genomes  excludes  the 
possibility  that  the  trait  is  oligogenic. 

Similar  to  the  Avy  allele,  the  Axin  u  mutation  results  from  an  insertion  of  the  IAP 
into  intron  6  of  the  Axin  1  gene  located  on  mouse  chromosome  17  (Vasicek 
et  al.  1997).  The  LTR  of  the  IAP  acts  as  a  promoter  and  directs  the  synthesis  of 
several  abnormal  transcripts.  Abnormal  transcription  products  interfere  with  the 
normal  function  of  the  AXIN1  protein  and  cause  vertebral  fusion  and  a  kinky  tail. 
Consequently,  in  the  homozygous  state,  the  mutation  causes  multiple  anomalies 
and  is  embryonic  lethal  (Reed  1937;  Dunn  and  Caspari  1942).  However,  in  the 
heterozygous  state  it  is  dominant  with  variable  penetrance,  i.e.,  a  proportion  of 
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heterozygous  carriers  display  mutant  phenotypes  (kinky  tails)  whereas  other 
carriers  have  normal  tails  (Reed  1937).  Initially,  the  mode  of  inheritance  was 
defined  as  semidominant  (Reed  1937);  however  a  better  understanding  of  the 
mechanism  supports  a  dominant  mode  of  inheritance  with  reduced  penetrance. 
Similar  to  the  Avy  mutation,  loss  of  penetrance  of  this  dominant  mutation  is  due 
to  methyl ation  of  the  LTR  promoter  which  prevents  abnormal  transcription  in  the 
locus  (Rakyan  et  al.  2003).  The  inheritance  pattern  of  the  kinked  tail  phenotype  is 
consistent  with  the  transgenerational  epigenetic  inheritance  paradigm  (Rakyan 
et  al.  2003).  An  in-depth  analysis  of  the  methylation  changes  during  the  develop- 
mental window  between  fertilization  and  blastocyst  stage  that  is  critical  for  epige- 
netic reprogramming  demonstrated  that  methylation  of  the  IAP  is  lost  during 
preimplantation  development,  but  reestablished  after  implantation  in  accordance 
with  the  methylation  of  the  gametes  that  gave  rise  to  the  embryo.  Hence,  another 
epigenetic  mark,  distinct  from  DNA  methylation,  directs  the  reestablishment  of 
DNA  methylation  patterns  at  the  IAP  elements.  Several  candidates,  including  a 
specific  profile  of  histone  modifications  and  presence  of  hydroximethylcytosine,  are 
proposed  as  guides  for  the  remethylation  of  the  IAP  after  implantation  (Fernandez- 
Gonzalez  et  al.  2010;  Hackett  et  al.  2013).  Thus,  the  sum  of  evidence  indicates  that 
variation  in  the  IAP  methylation  patterns  and  thereby  the  penetrance  of  Avy  and 
Axin  u  mutations  stem  from  two  sources.  Firstly,  the  somatic  epigenetic  marks  of 
the  paternal  IAP  LTR  alleles  are  not  erased  in  the  primordial  germ  cells,  nor  do  the 
alleles  show  substantial  gain  of  methylation  during  gametogenesis  (Lane 
et  al.  2003;  Blewitt  et  al.  2006).  Therefore  the  embryo  inherits  almost  exactly  the 
same  methylation  pattern  that  was  in  the  oocyte/spermatocyte  of  its  parent.  Sec- 
ondly, stochastic  gain  or  loss  of  methylation  occurs  while  the  IAP  LTR  is  being 
remethylated  in  the  embryo  or  in  the  course  of  multiple  cell  divisions.  This  results 
in  the  embryo  being  mosaic  for  the  epigenotype  as  well  as  for  the  phenotype. 

Thus,  a  single  mutation  (IAP  insertion,  as  demonstrated  in  the  cases  of  the  Avy 
and  Axin  u  alleles)  in  combination  with  variable  DNA  methylation  is  capable  of 
producing  a  range  of  phenotypes.  Such  an  interaction  between  genetic  and  epige- 
netic factors  generates  a  quantitative  genetic  trait  or  a  dominant  trait  with  variable 
penetrance.  Moreover,  failure  to  reset  the  IAP  methylation  marks  in  the  germ  line 
adds  another  layer  of  inheritance — the  epigenetic  inheritance  where  the  somatic 
methylation  marks  are  transmitted  through  generations.  The  mutant  allele 
predisposes  to  a  specific  phenotype  (coat  color,  obesity,  or  kinky  tail)  but  unless 
the  methylation  state  of  this  allele  in  a  given  individual  is  known  the  phenotype 
cannot  be  predicted.  In  contrast  to  inbred  laboratory  mouse  strains,  the  human 
population  genetically  is  highly  heterogenous.  It  is  therefore  difficult  to  distinguish 
between  transgenerational  epigenetic  inheritance  and  action  of  several  genetic 
modifiers  with  additive  effects  and  variable  penetrance  of  mutant  alleles  (such  as 
seen  in  Avy  and  Axin  Fu  heterozygous  carrier  mice)  observed  in  human  families  is 
likely  to  be  explained  by  an  effect  of  one  or  more  unlinked  modifier  genes.  The  role 
of  epigenetic  factors  in  the  etiology  of  human  disease  has  only  recently  attracted 
wide  attention  and  is  discussed  in  detail  in  other  chapters  of  this  book.  We  are  about 
to  witness  a  breakthrough  in  the  dissection  of  complex  genetic  disorders  in  humans 
with  specific  focus  on  the  role  of  DNA  methylation. 


114 


A.K.  Naumova 


When  one  considers  the  host  defense  mechanism  and  the  evolutionary 
perspective,  it  is  plausible  that  protection  of  methylated  foreign  DNA  from  demeth- 
ylation  in  the  primordial  germ  cells  is  beneficial.  Once  a  foreign  DNA  sequence  is 
recognized,  methylated,  and  silenced,  it  is  more  economical  to  preserve  this 
epigenetic  information  rather  than  revisit  it  in  every  generation.  Therefore,  to 
avoid  erasure  in  primordial  germ  cells,  epigenetic  marks  on  foreign  DNA  would 
have  to  differ  from  epigenetic  marks  associated  with  endogenous  genes  (Hackett 
et  al.  2012).  It  seems,  therefore,  a  reasonable  conjecture  that  in  humans 
transgenerational  epigenetic  inheritance  could  be  also  detected  in  association  with 
viral  genomes  (endogenous  retroviruses,  hepatitis  B,  or  human  immunodeficiency 
virus)  that  integrated  into  the  human  genomic  DNA  and  may,  in  principle,  be 
transmitted  from  one  generation  to  another  (Hadchouel  et  al.  1985;  Naumova 
et  al.  1986;  Naumova  and  Kisselev  1990;  Tsuei  et  al.  2002;  Wang  et  al.  2011).  In 
humans,  retrotransposon  insertions  have  been  implicated  in  the  pathogenesis  of  a 
number  of  cases  of  single-gene  defects  (Hancks  and  Kazazian  2012)  but  their  role 
in  common  disease  has  not  been  addressed.  The  modeling  of  transgenerational 
epigenetic  effects  in  the  human  population  and  their  role  in  evolution  are  discussed 
in  Chap.  11. 


5.3    Epigenetic  Memory  at  Imprinted  Regions 
and  Transmission  Ratio  Distortion 

Certain  regions  in  the  mammalian  genome  are  imprinted,  i.e.,  carry  different  DNA 
methylation  marks  on  the  paternal  and  maternal  chromosomes.  Imprinted  regions 
are  demethylated  in  primordial  germ  cells  and  later  remethylated  in  the  germ  line  in 
a  sex- specific  manner,  but  unlike  the  gametic  marks  in  the  rest  of  the  genome, 
imprinting  marks  are  maintained  in  the  early  embryo  (see  Chap.  1  for  details).  This 
leads  to  conservation  of  parent-of-origin-dependent  methylation  at  imprinted 
regions  in  embryonic  somatic  cells  and  suggests  that  epigenetic  marks  other  than 
DNA  methylation  are  present  at  these  regions  and  guide  remethylation  of  DNA.  As 
a  consequence,  imprinted  regions  are  more  susceptible  to  DNA  methylation 
resetting  errors  that  occur  in  the  germ  line  since  these  errors  would  not  be  corrected 
by  global  demethylation  in  the  preimplantation  period.  Therefore  imprinted  regions 
are  prone  to  display  transgenerational  epigenetic  inheritance.  Because  of  the  pivotal 
role  of  imprinted  genes  in  embryonic  development,  failure  to  properly  reset  geno- 
mic imprints  in  the  parental  germ  line  would  manifest  either  as  embryonic  death 
and  infertility  or  congenital  developmental  disorders  (termed  imprinting  disorders) 
in  children.  Indeed,  a  proportion  of  cases  with  imprinting  disorders,  including 
Beckwith-Wiedemann  and  Prader-Willi  syndromes,  are  associated  with  abnormal 
DNA  methylation  and  are  consistent  with  imprint  resetting  defects  in  the  parental 
germ  line  (Reik  et  al.  1995;  Joyce  et  al.  1997;  Buiting  et  al.  1998;  DeBaun 
et  al.  2003;  Bliek  et  al.  2006).  Moreover,  failure  to  reset  genomic  imprints  has 
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been  proposed  as  the  first  hit  in  a  modified  version  of  Knudsen's  two-hit  model 
explaining  the  etiology  of  embryonic  tumors  such  as  retinoblastoma,  Wilms  tumor, 
rhabdomyosarcoma,  and  neuroblastoma  (Scrable  et  al.  1989,  1990;  Sapienza  1991; 
Naumova  and  Sapienza  1994).  This  hypothesis  is  further  supported  by  the  aberrant 
DNA  methylation  and  expression  of  imprinted  genes  in  embryonic  tumors  (Rainier 
et  al.  1993;  Ohtani-Fujita  et  al.  1997;  Cui  et  al.  1998;  Feinberg  2000).  Furthermore, 
it  has  been  recently  demonstrated  that  in  some  individuals,  a  proportion  of  mature 
ejaculated  spermatozoa  carry  somatic  instead  of  gametic  DNA  methylation  patterns 
in  imprinted  regions  (Marques  et  al.  2004;  Kobayashi  et  al.  2007,  2009;  Marques 
et  al.  2009).  Although  such  abnormal  imprinting  is  more  common  among  infertile 
men  with  oligospermia  (reduced  sperm  counts),  a  number  of  fertile  men  also  carry 
abnormal  DNA  methylation  profiles  (Marques  et  al.  2004;  Kobayashi  et  al.  2007, 
2009;  Marques  et  al.  2009).  A  study  of  DNA  methylation  imprints  in  spontaneously 
aborted  fetuses  and  their  fathers  demonstrated  that  the  fathers  of  fetuses  with 
abnormal  genomic  imprints  had  abnormal  DNA  methylation  patterns  at  the  same 
regions  in  their  spermatozoa  (Kobayashi  et  al.  2009).  Hence,  the  fetuses  inherited 
abnormal  methylation  of  imprinted  regions  from  their  fathers.  These  data  are  the 
most  compelling  evidence  to  date  in  support  of  transmission  of  imprint  resetting 
errors  through  the  germ  line  in  humans. 

To  fully  appreciate  the  impact  of  transgenerational  epigenetic  inheritance  in  the 
human  population,  an  estimate  of  its  prevalence  is  necessary.  It  could  be  an 
extremely  rare  occurrence  associated  with  specific  mutations  or  it  may  be  a 
common  phenomenon  whose  impact  on  the  population  is  reduced  by  negative 
selection  of  gametes  or  embryos  that  carry  unerased  somatic  epigenetic  marks 
(Croteau  et  al.  2001;  Naumova  et  al.  2001). 

If  failure  to  reset  somatic  DNA  methylation  imprints  commonly  occurs  in  a 
small  proportion  of  germ  cells,  this  will  result  in  parental  somatic  imprints  being 
transmitted  without  change  to  some  of  the  offspring  (Naumova  et  al.  2001;  Croteau 
et  al.  2002).  For  those  imprinted  regions  that  harbor  genes  that  are  essential  for 
embryonic  viability,  such  an  abnormal  imprinting  pattern  would  lead  to  embryonic 
loss.  This  loss  would  occur  very  early,  even  before  implantation,  and  would  then  be 
detected  as  deviation  from  1:1  Mendelian  transmission  ratio  of  alleles  at  the 
affected  locus.  This  hypothesis  predicts  that  transgenerational  epigenetic  inheri- 
tance will  be  detected  as  non-Mendelian  allelic  transmission  ratios  at  imprinted 
genomic  regions  with  preferential  transmission  of  alleles  from  the  grandparent  of 
the  same  sex  as  the  parent  (i.e.,  alleles  from  the  maternal  grandmother  or  the 
paternal  grandfather)  (Naumova  et  al.  2001).  Indeed,  grandparental  origin- 
dependent  transmission  ratio  distortion  has  been  found  at  several  imprinted  regions 
in  both  humans  and  mice  (Naumova  et  al.  2001;  Croteau  et  al.  2002;  Yang 
et  al.  2008).  The  transmission  ratio  distortion  was  in  the  expected  direction,  i.e., 
in  most  cases  the  alleles  from  the  maternal  grandmother  or  the  paternal  grandfather 
were  transmitted  to  grandchildren  with  higher  probability  than  the  alleles  of  the 
paternal  grandmother  or  the  maternal  grandfather.  To  test  if  the  molecular  mecha- 
nism underlying  grandparental  origin- dependent  transmission  ratio  distortion  was 
indeed  related  to  DNA  methylation,  the  effect  of  a  mutation  in  the  gene  for  the 
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major  maintenance  DNA  methyltransferase  DNMT1  on  transmission  ratio 
distortion  was  tested  in  mouse  crosses.  The  transmitting  parent  was  heterozygous 
for  a  loss-of-function  mutation  in  the  Dnmtl  gene,  which  reduced  the  abundance 
of  Dnmtl  transcripts  in  both  male  and  female  germ  cells  and  led  to  reduction  in 
protein  concentration.  Dnmtl  mutation  in  the  mother  restored  Mendelian  transmis- 
sion of  maternal  alleles  at  the  locus  that  previously  showed  transmission  ratio 
distortion  among  the  offspring  of  wild-type  females.  Conversely,  a  Dnmtl  mutation 
in  the  father  resulted  in  deviation  from  Mendelian  transmission  ratios  of  paternal 
alleles  at  the  same  locus  (Yang  et  al.  2008).  These  data  support  the  role  of  DNA 
methylation  in  the  etiology  of  grandparental  origin-dependent  transmission  ratio 
distortion.  Effects  of  transmission  ratio  distortion  on  genetic  analyses  are  discussed 
in  detail  in  Chap.  12. 


5.4    Genetic  Factors  and  Transgenerational 
Epigenetic  Effects 

Mutations  in  genes  that  encode  factors  required  for  epigenetic  reprogramming  may 
modify  the  likelihood  that  epigenetic  information  will  be  preserved  through  trans- 
mission to  the  next  generation.  Among  such  mutations,  those  that  compromise 
epigenetic  remodeling  or  maintenance  of  epigenetic  marks  in  the  whole  genome 
will  cause  infertility  and/or  embryonic  death,  as  loss  of  proper  epigenetic  marks  at 
multiple  genomic  loci  disorganize  the  finely  tuned  gene  expression  ensemble. 
Striking  examples  of  maternal  effects  where  mutations  in  the  mother  result  in 
chaotic  DNA  methylation  in  offspring  are  the  targeted  mutations  of  the  oocyte- 
specific  isoform  of  Dnmtl  RNA,  Dnmtl  o,  and  the  tripartite  containing  28  (Trim28) 
gene,  which  encodes  a  component  of  the  heterochromatin-inducing  complex.  Both 
mutations  cause  extensive  variation  in  DNA  methylation  profiles  and  remarkable 
phenotypic  variation  among  embryos  derived  from  homozygous  mutant  females 
(Howell  et  al.  2001;  Cirio  et  al.  2008;  Toppings  et  al.  2008;  Messerschmidt 
et  al.  2012).  Maternal  depletion  of  DNMTlo  or  TRIM28  renders  embryos  mosaic 
for  imprinted  gene  expression  which  in  turn  is  manifested  as  variation  in  embryonic 
phenotypes  ranging  from  nearly  normal  appearance  to  severe  malformations  (Cirio 
et  al.  2008;  Toppings  et  al.  2008;  Messerschmidt  et  al.  2012)  (Fig.  5.3a).  The  vast 
majority  of  embryos  from  Dnmtl o~'~  mothers  die  prenatally  at  different  stages  of 
development  independent  of  their  genotypes  (Howell  et  al.  2001;  Toppings 
et  al.  2008),  therefore  analysis  of  further  generations  is  not  possible.  However, 
genetic  regulatory  variants  or  mutations  that  attenuate  enzymatic  or  DNA-binding 
activity  but  do  not  cause  catastrophic  failure  of  epigenetic  reprogramming  are  likely 
to  modify  the  fidelity  of  reprogramming  in  the  germ  line  and  early  embryo. 

DNA  methylation  patterns  depend  on  the  genetic  background  of  both  the  parents 
and  the  offspring.  Early  studies  of  transgene  methylation  in  mice  demonstrated  the 
critical  role  of  the  strain  harboring  the  transgene  (Weng  et  al.  1995;  Engler 
et  al.  1998).  A  genetic  modifier  locus  of  transgene  methylation  was  mapped  to 
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mutation  In  gene  encoding 

an  epigenetic  modifier,  unaffected 


♦ 


epimutation  in  an  unlinked  target 
region,  affected 

mutation  and  epimutation  in  the 
same  individual,  affected 

no  mutation  nor  epimutation, 
unaffected 


Fig.  5.3  Transgenerational  epigenetic  effects  and  complex  inheritance  patterns,  (a)  Maternal 
effect  mutation  in  the  mouse  Trim28  gene  results  in  high  variation  in  embryo  phenotypes  (courtesy 
Dr.  D.  Messerschmidt;  adapted  from  Messerschmidt  et  al.  Trim28  Is  Required  for  Epigenetic 
Stability  During  Mouse  Oocyte  to  Embryo  Transition,  Science  2012:  v.  335,  pp.  1499-1502  with 
permission  from  The  American  Association  for  the  Advancement  of  Science).  Mutant  E15.5 
embryos  displaying  large  array  of  phenotypes  and  growth  defects  compared  with  a  control 
embryo,  (b)  Mutations  in  genes  encoding  epigenetic  modifiers  or  chromosomal  translocations 
that  act  in  the  germ  line  and  affect  the  DNA  methylation  profiles  of  their  target  genes  in  trans 
segregate  independently  from  the  target  genomic  regions  and  may  or  may  not  be  present  in  the 
affected  individual.  Such  mutations  or  chromosomal  anomalies  exhibit  parental  effects  on 
phenotype 


the  distal  region  of  mouse  chromosome  4  and  was  termed  strain-specific  modifier  of 
transgene  methylation  1  (Ssml)  (Weng  et  al.  1995;  Engler  et  al.  1998)  (Table  5.1). 
However,  to  date,  no  specific  gene  has  been  linked  to  this  methylation  activity  and 
the  identity  of  Ssml  remains  unknown. 

Comparison  of  the  Avy  and  Axin  u  mouse  models  uncovered  a  new  dimension  to 
the  phenomenon  of  transgenerational  epigenetic  inheritance:  the  effect  of  geneti- 
cally defined  characteristics  of  the  oocyte  cytoplasm  on  the  epigenetic  marks 
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inherited  from  the  father  (Rakyan  et  al.  2003).  Oocytes  from  inbred  C57BL/6 
females  (C57BL/6  was  the  background  strain  in  the  initial  Avy  studies)  successfully 
erased  the  epigenetic  marks  from  the  paternal  IAP,  whereas  oocytes  from  129/SvJ 
inbred  females,  which  were  used  in  the  Axin  u  study,  preserved  the  marks  (Rakyan 
et  al.  2003). 

To  identify  genetic  modifiers  of  transgenerational  epigenetic  inheritance,  a  screen 
of  mutations  that  were  induced  by  administering  the  chemical  mutagen  TV-ethyl-TV- 
nitrosourea  (ENU)  to  male  mice  was  conducted  (Blewitt  et  al.  2005;  Chong 
et  al.  2007b;  Ashe  et  al.  2008).  ENU-induced  mutations  were  evaluated  for  their 
effects  on  the  epigenetic  state  and  expression  of  a  transgene  that  contained  a  green 
fluorescent  protein  as  a  marker  of  expression  and  was  known  for  producing  phenotypic 
variegation.  Next,  each  mutation  was  assessed  for  its  ability  to  modify 
transgenerational  epigenetic  inheritance  in  the  Avy  mouse  model.  The  screen  identified 
a  number  of  modifiers  of  epigenetic  inheritance  (Table  5.1).  Most  of  those  whose 
functions  were  previously  known  turned  out  to  be  genes  involved  in  the  establishment 
and  regulation  of  epigenetic  states,  as  one  would  expect.  Genes  encoding  DNA 
methyltransferases  and  genes  encoding  chromatin  modifiers  were  among  those 
influencing  epigenetic  variation  and  transgenerational  inheritance  (Chong 
et  al.  2007b;  Ashe  et  al.  2008)  (Table  5.1).  SWI/SNF-related,  matrix-associated, 
actin-dependent  regulator  of  chromatin,  subfamily  a,  member  5  (Smarca5)  encodes 
a  chromatin-remodeling  protein  that  is  present  in  spermatocytes  and  round  spermatids. 
DNMT1  is  the  major  maintenance  DNA  methylation  enzyme  that  is  responsible  for 
copying  the  methylation  patterns  from  the  original  DNA  strand  to  the  newly 
synthesized  strand  before  mitosis.  DNMT3L  is  a  member  of  the  methyltransf erase 
family  and  a  cof actor  of  de  novo  DNA  methyltransferase  3  A  (Okano  et  al.  1999;  Chen 
et  al.  2005;  Ooi  et  al.  2007).  It  is  therefore  possible  that  in  humans,  common 
polymorphisms  affecting  expression  or  function  of  these  genes  could  influence 
transgenerational  epigenetic  effects. 


5.5    Epigenetic  Consequences  of  Chromosomal 
Translocations 

Heterozygosity  for  balanced  chromosomal  translocations  interferes  with  homolo- 
gous chromosome  pairing  and  synapsis  that  take  place  during  meiotic  prophase. 
Portions  of  the  translocated  chromosomes  and  their  normal  homologs  that  are 
unable  to  synapse  are  silenced  by  a  mechanism  termed  meiotic  silencing  of 
unsynapsed  chromatin  (MSUC)  (reviewed  in  Burgoyne  et  al.  2009).  The  silent 
chromatin  state  spreads  along  the  chromosome  and  silences  genes  located  as  far  as 
10  Mb  from  the  actual  translocation  breakpoint  (Homolka  et  al.  2007).  Interest- 
ingly, meiotic  silencing  is  not  associated  with  DNA  methylation  of  gene  promoters 
in  the  unsynapsed  region  (Saferali  et  al.  2010). 

In  humans,  balanced  translocations  occur  in  about  0.2  %  of  live  births  (Hamerton 
et  al.  1975;  Chen  et  al.  2010;  Froslev-Friis  et  al.  201 1).  However,  their  prevalence  is 
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much  higher  among  infertile  men  and  families  with  recurrent  miscarriages  (Hassold 
1980, 1986;  Van  Assche  et  al.  1996).  Balanced  translocations  lead  to  a  wide  variety 
of  reproductive  outcomes  ranging  from  activation  of  the  mid-pachytene  checkpoint 
mechanism,  meiotic  arrest,  and  infertility,  problems  with  chromosome  segregation, 
aneuploid  gametes,  and  embryonic  lethality  among  offspring  to  completely  normal 
phenotypes.  Phenotypic  manifestations  depend  upon  the  sex  of  the  individual,  type 
of  translocation,  as  well  as  stochastic  factors.  It  has  been  hypothesized  that  the  effect 
of  a  particular  chromosomal  translocation  on  meiotic  progression  depends  upon  the 
functions  of  genes  located  within  the  unsynapsed  regions  near  the  translocation 
breakpoint  and  silenced  by  the  meiotic  silencing  mechanism  (Burgoyne  et  al.  2009). 
If  this  hypothesis  is  correct,  then  translocations  occurring  near  genes  that  are  critical 
for  epigenetic  reprogramming  during  and  after  meiotic  prophase  may  compromise 
the  resetting  of  epigenetic  marks  and  thereby  increase  the  probability  of  epigenetic 
errors  and  transgenerational  epigenetic  inheritance  among  the  offspring  of  translo- 
cation carriers. 

This  hypothesis  was  tested  in  mice  that  carry  a  Robertsonian  translocation 
involving  chromosomes  8  and  12.  The  pericentromeric  region  (translocation 
breakpoint)  of  mouse  chromosome  12  harbors  the  DNA  methyltransf erase  3 A 
(Dnmt3a)  gene.  DNMT3A  is  a  de  novo  DNA  methyltransferase  that  is  essential 
for  the  establishment  of  gametic  DNA  methylation  marks  in  imprinted  regions 
(Kaneda  et  al.  2004;  Kato  et  al.  2007).  If  Dnmt3a  is  silenced  during  meiotic 
prophase  its  deficiency  may  affect  DNA  methylation  levels  throughout  the  sper- 
matocyte genome.  Analysis  of  DNA  methylation  of  imprinted  regions  in  the  sperm 
of  heterozygous  translocation  carriers  showed  abnormal  DNA  methylation  at  the 
imprinted  HI  9  locus  on  mouse  chromosome  7  (Saferali  et  al.  2010).  The  HI  9 
promoter  includes  a  differentially  methylated  region  that  is  normally  methylated 
in  sperm,  but  unmethylated  in  oocytes.  These  data  show  that  chromosomal 
translocations  may  cause  transgenerational  epigenetic  effects  in  trans.  Thus, 
depending  upon  the  genetic  context  of  the  translocation  breakpoint,  chromosomal 
translocations  may  cause  epigenetic  disturbances  at  a  specific  stage  of  meiosis  and 
promote  transgenerational  epigenetic  inheritance  at  loci  that  segregate  indepen- 
dently from  the  translocation  (Fig.  5.3b).  This  type  of  inheritance  is  somewhat 
reminiscent  of  the  "hit-and-run"  mechanism  proposed  for  viral  carcinogenesis 
(Skinner  1976).  In  the  case  of  balanced  chromosomal  translocations  (similar  to 
genetic  modifiers  with  parental  effects)  the  causal  genetic  defect  segregates  inde- 
pendently from  its  epigenetic  target,  the  gene  that  is  directly  responsible  for  the 
phenotypic  change,  and  therefore  may  not  be  present  in  the  affected  individual 
(Fig.  5.3b).  In  principle  with  this  type  of  inheritance,  the  target  genes  may  be 
mapped  through  epigenome-wide  association,  but  not  genetic  association  or  linkage 
mapping  approaches.  However,  the  causal  genetic  defect  would  only  be  found  if 
genetic  analyses  were  conducted  in  parents  rather  than  affected  offspring. 

To  date,  however,  no  specific  human  pathology  has  been  associated  with  a 
parental  translocation  and  such  a  mode  of  inheritance.  The  caveat  here  is  that  this 
specific  mechanism  has  not  been  researched  in  genetic  studies  of  human  epigenetic 
disorders.  Furthermore,  infertility  in  carriers  of  balanced  translocations  affecting 
the  epigenome  may  be  an  efficient  barrier  preventing  the  birth  of  children  with 
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epigenetic  anomalies.  An  illustration  for  this  possibility  is  the  increased  prevalence 
of  translocations  among  infertility  patients  (Bache  et  al.  2004).  The  human  genome 
contains  a  large  number  of  histone  genes,  some  of  which  are  organized  into  clusters 
located  on  chromosomes  1  (lq21  and  lq42)  and  6  (6p21-p22).  Histones  are 
indispensable  for  normal  genome  function  in  both  somatic  and  meiotic  cells. 
Therefore,  one  may  predict  that  translocations  that  occur  near  these  histone  gene 
clusters  will  cause  their  meiotic  silencing.  Transcriptional  repression  of  histone 
genes  starting  from  the  pachytene  stage  of  meiotic  prophase  will  compromise 
the  epigenetic  control  of  the  whole  genome  and  lead  to  meiotic  arrest  and  infertility, 
or  cause  transgenerational  epigenetic  effects  in  a  proportion  of  live  offspring. 
A  survey  of  balanced  translocations  in  464  infertile  men  shows  an  excess  of 
chromosome  1  translocations,  with  the  most  common  anomaly  involving  chromo- 
somal region  lq21  (Bache  et  al.  2004)  near  the  histone  cluster  HIST2  in  lq21.2. 
Hence,  abnormal  levels  of  transcription  of  histone  genes  during  meiosis  may  be  the 
cause  of  infertility  in  lq21  translocation  carriers  and  potentially  increase  the  risk  of 
epigenetic  anomalies  among  their  offspring. 


5.6    Environmental  Exposures  and  Transgenerational 
Epigenetic  Effects 

The  idea  that  environmental  influences  (e.g.,  food  supply,  exposure  to  chemicals,  or 
stress)  that  were  experienced  by  parents  or  grandparents  may  define  the  health 
(predisposition  to  diabetes,  cardiovascular  disease,  response  to  stress,  obesity,  etc.) 
of  the  subsequent  generations  that  were  not  exposed  to  such  influences  is  an 
attractive  explanation  for  complex  behavioral  and  metabolic  phenotypes  (Pembrey 
et  al.  2006;  Crews  et  al.  2012;  Guerrero-Bosagna  and  Skinner  2012)  (reviewed  in 
Latham  et  al.  2012).  This  hypothesis  has  been  supported  by  several  studies  in 
different  mammals  (Anway  et  al.  2005;  Anway  and  Skinner  2006;  Pembrey 
et  al.  2006;  Golding  et  al.  2010;  Crews  et  al.  2012;  Guerrero-Bosagna  and  Skinner 
2012;  Nilsson  et  al.  2012).  The  proposed  underlying  mechanism  is  that  the  envi- 
ronmental stimulus  changes  the  epigenetic  profile  of  certain  genes  either  in  the 
germ  line  or  in  the  early  embryo  and  the  memory  of  the  environmental  exposure  is 
thereby  transmitted  in  the  form  of  an  epigenetic  mark  through  the  parental 
germ  line. 

In  principle,  environmental  exposures  may  cause  at  least  three  different  types  of 
transgenerational  effects. 

(a)  Drugs  or  toxins  induce  mutations  and  these  mutations  in  turn  cause  changes  in 
DNA  methylation. 

(b)  Environmental  factors  cause  stochastic  but  stable  DNA  methylation  changes 
(epimutations)  in  the  absence  of  genetic  mutations.  These  epimutations  are 
never  reversed  and  will  be  transmitted  unchanged  through  multiple  generations 
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unless  they  are  lethal  or  cause  infertility  (Anway  et  al.  2005;  Chong 
et  al.  2007a;  Guerrero-Bosagna  and  Skinner  2012). 
(c)  Environmental  factors  influence  the  probability  of  certain  DNA  methylation 
profiles  at  specific  target  loci  and  act  in  the  germ  line  (Wolff  et  al.  1998;  Cooney 
et  al.  2002;  Waterland  and  Jirtle  2003). 

It  is  also  important  to  keep  in  mind  that  living  organisms  are  open  systems 
constantly  interacting  with  the  environment,  and  hence  changes  in  gene  expression 
profiles  rapidly  occur  in  response  to  environmental  cues.  For  example,  changes  in 
gene  expression  are  associated  with  rapid  and  reversible  changes  in  histone 
modifications  at  loci  that  are  essential  for  the  specific  response  (Holloway 
et  al.  2002;  Chinnusamy  and  Zhu  2009;  Koike  et  al.  2012;  Satake  and  Iwasa 
2012).  Such  environmentally  induced  epigenetic  responses  are  transient  and  there- 
fore do  not  usually  cause  transgenerational  epigenetic  inheritance  in  higher 
organisms  (Chinnusamy  and  Zhu  2009). 

The  complexity  of  phenotypes  and  the  large  number  of  potential  target  genes 
make  it  difficult  to  dissect  the  underlying  molecular  mechanism  and  directly  test  the 
hypothesis  of  the  environmental  origins  of  transgenerational  epigenetic  effects, 
particularly  in  humans.  Transmission  of  epigenetic  changes  through  germ  cells 
has  been  demonstrated  in  only  a  few  of  these  studies  (Anway  et  al.  2005).  More- 
over, the  potential  role  of  new  mutations  or  genetic  variants  has  not  been  ruled  out 
in  many  cases  (Anway  et  al.  2005;  Kaati  et  al.  2007;  Golding  et  al.  2010;  Borghol 
et  al.  2012).  Therefore,  strictly  speaking,  although  a  number  of  observations  are 
compatible  with  transgenerational  epigenetic  inheritance  through  the  germ  line, 
solid  experimental  evidence  confirming  such  an  inheritance  mechanism  exists  for 
only  a  small  number  of  cases.  Inbred  mouse  models,  the  Avy  and  AxinFu  mice,  bring 
insight  into  the  interaction  between  environment  and  epigenetic  memory.  In  these 
mouse  models,  the  phenotype  depends  on  only  one  mutation,  and  the  role  of  DNA 
methylation  in  the  phenotypic  manifestation  of  the  genotype  has  been  well 
established.  The  Avy  mouse  was  used  to  investigate  whether  the  levels  of  folic 
acid  in  maternal  diet  influenced  the  penetrance  of  the  Avy  mutation  (Wolff 
et  al.  1998;  Cooney  et  al.  2002;  Waterland  and  Jirtle  2003). 

Folic  acid  is  part  of  the  metabolic  pathway  involved  in  production  of  the 
universal  methyl  group  donor  S-adenosyl  methionine,  which  provides  methyl 
groups  for  DNA  and  protein  methylation.  Decreased  folic  acid  levels  in  the 
maternal  diet  during  pregnancy  reduced  the  IAP  methylation  level  in  the  offspring 
and  thereby  increased  the  penetrance  of  the  yellow  coat  color  or  the  kinked  tail 
phenotype  as  described  above  (Wolff  et  al.  1998;  Cooney  et  al.  2002;  Waterland 
and  Jirtle  2003;  Waterland  et  al.  2006).  Conversely,  folic  acid  supplementation 
during  pregnancy  in  the  same  dams  increased  the  proportion  of  offspring  with  high 
IAP  methylation  and  pseudoagouti  coat  color  or  normal  tails  (Waterland  and  Jirtle 
2003;  Waterland  et  al.  2006).  However  the  diet-induced  hypermethylation  of  the 
IAP  in  Avy  dams  did  not  accumulate  over  generations  when  several  generations  of 
females  were  fed  a  folic  acid-supplemented  diet  (Waterland  et  al.  2007).  Moreover, 
the  proportion  of  pseudoagouti  mice  declined  over  generations.  The  most  plausible 
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explanation  is  that  diet-induced  DNA  methylation  is  efficient  in  somatic  cells  of  the 
F0  generation,  but  was  actively  erased  or  selectively  not  maintained  in  the  germ 
cells  of  the  female  embryos  in  these  experiments  (Waterland  et  al.  2007).  In 
summary,  environmental  agents  may  induce  de  novo  genetic  mutations  and  herita- 
ble epimutations  or  shift  the  epigenetic  variation  in  such  a  way  that  certain 
epigenetic  variants  will  have  higher  chances  of  appearing  among  the  offspring.  In 
the  context  of  complex  traits,  environmental  exposures  may  influence  the  pene- 
trance of  phenotypes.  By  influencing  the  chances  that  certain  epigenetic  variants 
would  occur,  environmental  exposures  may  modulate  the  prevalence  of  certain 
phenotypes  in  the  population.  The  caveat  here  is  that  targeted  epigenetic 
modifications  of  specific  genes  or  regions  are  not  possible  at  this  time.  Nonspecific 
environmental  factors  such  as  diet  or  chemicals  affect  epigenetic  modifications  in 
the  whole  genome  and  reducing  the  risk  of  disease  in  one  individual  may  increase 
the  risk  in  another  as  shown  for  the  folic  acid  supplementation  and  colon  cancer 
development  in  mice  and  humans  (Knock  et  al.  2006;  Van  Guelpen  et  al.  2006; 
Lawrance  et  al.  2009). 

Since  each  of  us  is  born  with  specific  genetic  and  epigenetic  baggage,  manipu- 
lation of  environmental  factors  seems  to  be  the  only  practical  solution  that  is 
available  to  fight  genetic  disease.  The  remote  possibility  that  one  day  we  could 
improve  our  phenotypes  through  targeted  modification  of  epigenetic  marks  is  a  very 
appealing  one,  if  judged  by  the  number  of  scientific  papers  on  environment  and 
epigenetics.  A  PubMed  search  using  key  words  "environment"  and  "epigenetics" 
returned  a  total  of  574  citations  dating  from  1997  to  2012.  Of  those,  81  %  (466) 
were  published  in  the  last  4  years  (2009-2012).  Therefore,  the  questions  of  the 
degree  to  which  the  environment  may  or  may  not  influence  the  epigenome;  which 
are  the  points  of  epigenetic  vulnerability  in  the  individual's  development;  as  well  as 
the  more  specific  question  of  how  environmental  exposures  in  the  parental  and 
grandparental  generations  influence  the  phenotypes  of  their  offspring  will  occupy 
the  minds  of  researchers  for  years  to  come  and  bring  new  and  exciting 
developments. 
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Chapter  6 

Autosomal  Monoallelic  Expression 


Virginia  Savova  and  Alexander  A.  Gimelbrant 


Abstract  In  mammals ,  relative  expression  of  the  maternal  and  paternal  alleles  of  many 
genes  is  controlled  by  three  types  of  epigenetic  phenomena:  X  chromosome  inactiva- 
tion,  imprinting,  and  mitotically  stable  autosomal  monoallelic  expression  (MAE).  MAE 
imposes  a  mitotically  stable  allelic  imbalance  in  the  expression  of  a  significant  fraction 
of  human  autosomal  genes.  Cells  in  the  same  individual  make  independent  choices 
of  active  and  inactive  alleles,  leading  to  remarkable  epigenetic  diversity  between 
otherwise  identical  clonal  lineages.  Genes  subject  to  MAE  play  critical  roles  in  a 
variety  of  major  disorders,  including  schizophrenia,  Alzheimer's  disease,  and  cancer. 
In  this  chapter,  we  review  the  current  state  of  understanding  of  MAE  biology,  and  assess 
various  implications  of  MAE  for  analysis  of  genotype-phenotype  relationship. 


6.1    Biology  of  Autosomal  Monoallelic  Expression 

A  variety  of  genetic  and  epigenetic  factors  affect  the  relative  expression  levels  of  the 
two  copies  of  each  given  gene  in  diploid  cells.  In  addition  to  genetic  variation  in 
regulatory  regions  that  affects  allele-specific  expression  (Cowles  et  al.  2002;  Yan 
et  al.  2002),  there  are  at  least  three  major  kinds  of  non-Mendelian,  epigenetic 
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Fig.  6.1  Autosomal  monoallelic  expression  in  the  context  of  epigenetic  mechanisms  controlling 
allele-specific  expression  in  mammalian  cells.  Main  mechanistic,  developmental,  and  functional 
questions  for  the  three  major  mechanisms  affecting  allelic  expression  imbalance.  Notes:  1  = 
reviewed  in  Abramowitz  and  Bartolomei  (2012)  and  Kelsey  and  Bartolomei  (2012);  2  =  reviewed 
in  Augui  et  al.  (2011)  and  Jeon  et  al.  (2012);  3  =  see  references  throughout  this  chapter 


phenomena  that  control  allele-specific  expression  in  mammals  (see  Fig.  6. 1).  One  is  the 
X  chromosome  inactivation:  During  the  development  of  female  embryos,  around  the 
time  of  implantation,  about  half  of  the  cells  choose  to  inactivate  the  maternal  X,  and  the 
rest  inactivate  the  paternal  X,  affecting  most  of  the  X-linked  genes  (Berletch  et  al.  2010; 
Carrel  and  Willard  2005;  Lyon  1961;  Yang  et  al.  2010).  Another  is  imprinting:  Genes 
such  as  IGF2  and  HI 9  are  expressed  either  from  paternal  or  from  maternal  allele 
(reviewed  in  Glaser  et  al.  2006).  These  two  epigenetic  mechanisms  are  reviewed  in 
the  companion  Chaps.  1,3,  and  5  in  this  volume. 

In  addition,  a  significant  fraction  of  mammalian  genes  are  subject  to  autosomal 
monoallelic  expression  (MAE).  MAE  is  observed  in  olfactory  receptor  genes 
(Chess  et  al.  1994),  as  well  as  genes  coding  for  immunoglobulins  and  some 
cytokines.  Using  genome- wide  analyses  of  allele-specific  expression,  we  and  others 
have  added  a  surprisingly  large  number  of  the  autosomal  genes  in  human  and 
mouse  to  the  MAE  class,  including  genes  implicated  in  a  number  of  major  human 
diseases,  such  as  Alzheimer's  disease  (APP)  and  cancer  (DAPK1)  (Gimelbrant 
et  al.  2007;  Jeffries  et  al.  2012;  Li  et  al.  2012;  Zwemer  et  al.  2012).  MAE  affects 
about  10  %  of  ~4,000  tested  genes  in  human  lymphoblastoid  cells  and  about  15  % 
of  more  than  1,300  assessed  genes  in  equivalent  mouse  cells.  Note  that  this  count 
excludes  olfactory  receptor  genes  which  by  themselves  constitute  about  5  %  of 
mammalian  protein-coding  genes.  There  is  also  evidence  that  ribosomal  DNA  gene 
clusters  are  subject  to  mitotically  stable  MAE  (Schlesinger  et  al.  2009).  Overall,  of 
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the  three  epigenetic  mechanisms  (depicted  in  Fig.  6.1)  MAE  affects  by  far  the 
greatest  number  of  genes. 

MAE  is  an  "autosomal  analog  of  X  inactivation"  in  the  sense  that  it  creates 
epigenetic  mosaicism:  Otherwise  identical  cells  in  the  same  individual  have  differ- 
ent "epigenotypes"  such  that  some  cells  express  the  maternal  copy  of  a  gene,  and 
other  cells  express  the  paternal  copy  of  that  gene.  If  the  two  alleles  of  the  gene  are 
functionally  distinct,  this  can  result  in  emergence  of  subpopulations  of  cells  with 
different  functional  properties,  such  as  differential  responsiveness  to  lipopolysac- 
charides  in  lymphocytes  of  mice  heterozygous  for  a  mutant  variant  of  the  Tlr4  gene 
(Pereira  et  al.  2003).  Since  X  chromosome  inactivation  is  much  better  known,  we 
will  structure  this  discussion  by  comparing  and  contrasting  MAE  and  X 
inactivation. 

Similarly  to  X  inactivation,  once  the  choice  of  the  active  allele  is  made,  it  is 
maintained  in  a  mitotically  stable  manner.  For  example,  in  one  experiment,  mouse 
cells  maintained  near-complete  silencing  of  one  copy  of  pi 20  catenin  after  a  year  in 
continuous  culture  (Gimelbrant  et  al.  2005).  More  generally,  since  most  systematic 
assessments  of  MAE  were  performed  on  cell  populations  grown  from  a  single  cell 
to  10-10°  cells  (Gimelbrant  et  al.  2007;  Zwemer  et  al.  2012),  we  can  conclude 
with  confidence  that  the  allelic  choices  genome-wide  have  been  maintained  through 
dozens  of  cell  divisions. 

One  important  consequence  of  these  stable  differences  between  clonal  lineages 
is  that  in  a  heterogeneous,  polyclonal  samples  MAE  could  pass  undetected  (see 
Fig.  6.2).  Not  coincidentally,  MAE  was  discovered  by  analyzing  specific  clonal  cell 
populations  of  immune  cells  (Bix  and  Locksley  1998;  Hollander  et  al.  1998;  Pernis 
et  al.  1965),  or  in  the  context  of  analysis  of  individual  cells,  such  as  olfactory 
sensory  neurons  (Chess  et  al.  1994),  and  kept  resurfacing  when  single-cell  or  clonal 
population  approaches  were  used  (Takizawa  et  al.  2008).  Strictly  speaking,  if  a 
gene  with  MAE  is  not  mitotically  stable  and  allows  switching  of  expression 
between  alleles  upon  mitosis,  it  would  remain  undetected  in  an  analysis  of 
allele- specific  expression,  even  in  a  single-cell  clone.  We  have  seen  no  evidence  of 
such  instability  in  individual  genes  tested  using  short  clone  expansion  times  (analyzed 
after  expanding  to  few  hundred  cells)  or  FISH  analysis  of  individual  cells  from  a 
population  showing  biallelic  expression  of  a  given  gene  (Gimelbrant  et  al.  2005, 
2007),  but  it  remains  possible  that  genes  that  show  such  labile  patterns  do  exist. 

Unlike  X  inactivation,  MAE  is  observed  both  in  male  and  female  cells.  In  genes 
that  show  MAE,  allelic  bias  is  often  very  strong:  complete  silencing  of  one  allele  for 
olfactory  receptors  and  immunoglobulins,  and  tenfold  or  greater  bias  in  allelic 
expression  for  many  other  autosomal  genes  (Gimelbrant  et  al.  2007).  At  the  same 
time,  MAE  appears  more  variable  than  X  inactivation.  While  X  inactivation  is 
ubiquitous  (i.e.,  occurs  in  all  somatic  cells  of  a  postimplantation  female  embryo), 
currently,  there  is  no  definitive  knowledge  of  tissue  specificity  of  MAE,  as  even 
clones  of  the  same  cell  type  are  different  from  each  other.  However,  small 
patches  of  placenta  and  clonal  fibroblasts  have  similar  fraction  of  MAE  genes 
(Gimelbrant  et  al.  2007),  and  analysis  of  individual  MAE  genes  in  multiple  clones 


134 


V.  Savova  and  A.A.  Gimelbrant 


O 

CD 
3 
CD 
</> 


Clones 


- 


.::] 

F: 

r  —  h 



1  

:::: 

•- 



L... 



L::::" 

maternal       biallelic  expression  paternal 

Fig.  6.2  Clonal  cells  show  unique  pattern  of  monoallelic  expression.  Allelic  choice  for  MAE 
genes  in  human  lymphoblastoid  cells.  Schematic  representation  of  maternal,  paternal,  or  biallelic 
expression  of  MAE  genes  on  chromosome  18  (based  on  the  data  from  Gimelbrant  et  al.  2007). 
Vertical  lines  correspond  to  individual  clones;  dotted  horizontal  lines  mark  genes.  Gene  marker  to 
the  right  of  vertical  line — maternal  expression;  to  the  left — paternal  expression;  present  on  both 
sides — biallelic  expression.  Nine  clones  were  derived  from  three  individuals.  Note  the  lack  of 
coordination  of  allelic  choice  along  the  chromosome  in  the  same  clone,  and  independent  choice  of 
allelic  state  of  the  same  gene  between  clones 


shows  that  a  gene  could  be  subject  to  MAE  in  one  tissue  and  not  subject  in  another 
(Gimelbrant  et  al.  2005). 

While  MAE  is  obligatory  for  olfactory  receptor  genes  (Serizawa  et  al.  2000)  and 
in  the  allelic  exclusion  of  immunoglobulin  genes  (Jung  et  al.  2006),  for  most  other 
autosomal  MAE  genes,  MAE  is  not  obligatory:  in  mouse  and  human  lymphocytes, 
more  than  80  %  of  MAE  genes  were  biallelic  in  at  least  one  assessed  clone 
(Gimelbrant  et  al.  2005,  2007).  This  somewhat  resembles  variability  among 
genes  that  escape  X  inactivation  (Carrel  and  Willard  2005),  though  it  is  unlikely 
that  similar  mechanisms  are  involved.  The  presence  of  two  active  alleles  vs.  one 
active  allele  in  clones  of  otherwise  similar  cells  resulted  in  (about  twofold)  higher 
expression  levels  at  least  for  a  small  number  of  assessed  genes  (Gimelbrant 
et  al.  2005,  2007).  In  this  connection,  it  is  worth  noting  that  the  temporal  changes 
in  Nanog  RNA  levels  during  differentiation  correlate  with  its  allelic  state:  MAE 
corresponds  to  lower  levels,  and  higher  RNA  levels  to  biallelic  expression 
(Miyanari  and  Torre s-Padilla  2012).  A  systematic  study  is  needed  to  establish 
whether  that  is  a  general  rule. 
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6.2    Open  Questions  in  MAE  Biology 

Our  growing  appreciation  of  the  prevalence  of  MAE  only  underscores  how  little  we 
know  about  its  biology.  Here  we  summarize  unresolved  questions  (see  also 
Fig.  6.1). 

MAE  was  directly  observed  in  vivo  using  fluorescent  protein  reporters  in 
interleukin-4  (Hu-Li  et  al.  2001)  and  olfactory  receptor  genes  (Serizawa 
et  al.  2000),  and  fluorescent  in  situ  hybridization  in  olfactory  epithelium  (Chess 
et  al.  1994)  and  in  human  peripheral  blood  lymphocytes  for  several  genes  initially 
detected  as  MAE  in  lymphoblastoid  cells  (Gimelbrant  et  al.  2007).  However,  MAE 
clonality  makes  its  assessment  on  a  genome-wide  scale  very  challenging;  thus  the 
only  large-scale  sets  of  data  are  collected  in  clonal  cell  lines  in  vitro,  primarily 
lymphoblastoid  cells.  The  limited  number  of  clones  thus  analyzed  is  insufficient  to 
generate  a  complete  catalog  of  MAE  genes  in  lymphoblastoid  cells,  and  little  is 
known  about  the  prevalence  of  MAE  in  other  cell  types.  Virtually  nothing  is  known 
about  the  establishment  of  MAE  during  development  and  differentiation. 

Mechanistically,  allelic  choice  has  been  linked  to  changes  in  chromatin  states  in 
two  special  cases:  olfactory  receptor  gene  choice  (Magklara  et  al.  2011)  and 
immunoglobulin-kappa  gene  rearrangement  (Farago  et  al.  2012).  In  contrast,  for 
hundreds  of  other  autosomal  MAE  genes,  no  molecular  features  have  so  far  been 
reported  to  be  strongly  associated  with  the  establishment  and  maintenance  of  allelic 
choice,  even  though  intriguing  correlations  have  been  observed:  (1)  MAE  genes  are 
more  likely  to  be  located  close  to  recombination  hot  spots  (Necsulea  et  al.  2009) 
and  (2)  near  clusters  of  related  genes  (Gimelbrant  and  Chess  2006)  and 
(3)  promoters  of  MAE  genes  are  enriched  in  a  statistically  significant  manner 
with  chromatin  modifications  associated  with  both  open  chromatin  (histone  H3 
Lys-4  methylation)  and  inactive  chromatin  (histone  H3  Lys-27  methylation)  (Bock 
et  al.  2009).  Non-imprinted  allele-specific  DNA  methylation  had  been  reported  in 
lymphoblasts  (Zhang  et  al.  2009);  however,  since  cell  culture  might  severely  affect 
DNA  methylation  levels  (Antequera  et  al.  1990),  making  lymphoblasts  potentially 
unreliable  in  this  respect  (Saferali  et  al.  2010),  the  role  of  DNA  methylation  in  MAE 
remains  unclear. 

In  a  marked  contrast  with  X  inactivation,  there  is  no  chromosome-wide  coordi- 
nation in  the  choice  of  the  active  and  inactive  alleles.  Instead,  genes  subject  to  MAE 
are  interspersed  with  biallelic  genes  throughout  the  genome,  with  each  locus'  allelic 
choice  apparently  independent  of  any  other  locus  (see  Fig.  6.2).  A  parsimonious 
explanation  posits  that  allelic  choice  involves  separate,  independent  regulatory 
regions  in  proximity  of  each  autosomal  MAE  locus.  These  hypothetical  sequences 
can  be  conceptualized  as  being  functionally  similar  to  X  inactivation  center  (Augui 
et  al.  2011),  but  affecting  a  small  region  rather  than  the  whole  chromosome. 

Finally,  there  is  no  satisfactory  general  explanation  of  the  biological  function  of 
MAE.  In  specialized  cases  of  olfactory  receptors  and  immunoglobulins,  plausible 
functional  accounts  exist:  these  are  the  receptors  that  impose  a  unique  and  precise 
specificity  onto  otherwise  identical  cells  (such  as  T-  or  B -lymphocytes,  or  sensory 
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neurons),  which  could  be  impaired  by  co-expression  of  the  other  allele.  There  is  no 
such  intuitively  appealing  hypothesis  with  regard  to  the  majority  of  MAE  genes, 
such  as  interleukins  or  amyloid  precursor  protein.  One  possible  hint  is  provided  by 
the  observation  that  among  MAE  genes,  there  is  a  strong  overrepresentation  of 
genes  coding  for  cell-surface  proteins  (Gimelbrant  et  al.  2007).  Between  dosage 
variation  (for  non-obligatory  MAE  genes)  and  independent  choice  of  allele  at  each 
of  many  hundreds  of  loci,  MAE  can  serve  as  a  combinatorial  mechanism  providing 
a  unique  signature  to  any  number  of  cells  or  clonal  lineages  at  their  interface  to 
extracellular  environment.  This  epigenetic  diversity  might  be  functionally  impor- 
tant in  the  development  of  complex  organs  (cf.  role  played  by  DSCAM  gene  in 
Drosophila:  due  to  extensive  alternative  splicing,  it  can  encode  thousands  of 
isoforms,  whose  homophilic  repulsion  is  the  basis  of  neuronal  self-recognition 
and  self-avoidance)  (reviewed  in  Hattori  et  al.  2008).  We  propose  that  such 
diversity  can  play  a  crucial  biological  role  by  providing  "passive  immunity":  an 
epigenetic  mosaic  of  clones,  by  virtue  of  presenting  subtly  distinct  combinations  of 
surface  proteins  to  infectious  agents,  can  have  different  levels  of  susceptibility 
rather  than  form  a  uniformly  susceptible  "monoculture."  Finally,  MAE  might  be  a 
nonadaptive  consequence  of  some  other  property  of  particular  genomic  areas.  To 
perform  a  crucial  experiment  determining  MAE  function,  one  would  need  to 
identify  mechanisms  involved  in  the  establishment  and  maintenance  of  the  MAE, 
eliminate  MAE  in  a  model  organism,  and  assess  the  impact  of  that  manipulation  in 
the  organism's  development  and  function. 


6.3    Autosomal  Monoallelic  Expression,  Clonality, 

and  Analysis  of  Genotype-Phenotype  Relationship 

Sequence-based  variation  in  gene  expression  is  a  key  driver  of  disease  risk. 
Identification  of  sequence  variants  regulating  expression  in  cis  is  a  major  effort  in 
genetics  of  complex  traits  (Gilad  et  al.  2008).  Whatever  proves  to  be  the  ultimate 
explanation  of  the  MAE  function,  existence  of  MAE  in  mammalian  cells  introduces 
a  number  of  challenges  into  analysis  and  interpretation  of  a  relationship  between 
genotype  and  phenotype,  both  in  the  context  of  efforts  to  understand  the  biology  of 
gene  regulation  and  in  the  context  of  personalized  medicine. 

The  central  issue  is  epigenetic  heterogeneity  introduced  by  MAE — the  fact  that 
clonal  lineages  which  are  "siblings"  of  each  other  can  be  quite  distinct  in  the 
spectrum  of  alleles  of  MAE  genes  that  are  chosen  and  are  maintained  over  long 
periods  of  time  (Fig.  6.3).  This  epigenetic  heterogeneity  means  that  knowledge  of 
the  average  expression  state  in  a  polyclonal  tissue  as  a  whole  does  not  translate  into 
precise  information  about  the  state  of  individual  cells.  Conversely,  the  state  of  a 
particular  clonal  lineage  may  not  translate  into  complete  information  about  overall 
level  of  expression  or  allelic  bias  in  a  given  cell  type,  even  when  assessing  cells 
with  exactly  the  same  genotype  (e.g.,  cells  from  the  same  individual). 
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Fig.  6.3  Relationship  between  allele- specific  expression  in  individual  clones  and  in  polyclonal 
tissue.  Schematic  depiction  of  clone-specific  allelic  choice  for  multiple  genes  in  three  hypothetical 
clonal  cell  subpopulations  in  one  individual,  and  the  resulting  allelic  (im)balance  in  the  polyclonal 
tissue.  Each  clone  is  represented  by  a  pair  of  chromosomes  (shown  as  vertical  lines).  Angled 
arrows — active  transcription;  left-side  arrow — paternal  expression;  right-side  arrow — maternal 
expression;  cross — transcriptional  silencing 


This  is  not  a  radically  novel  problem:  indeed,  it  is  quite  similar  to  pronounced 
differences  between  cells  of  the  same  type  that  X  inactivation  can  lead  to.  The 
heterogeneity  between  individual  cells  that  combine  into  cell  type  average  expres- 
sion values  is  particularly  important  when  considering  cell-autonomous  functions, 
and  in  particular,  tumor  initiation.  In  case  of  X  inactivation,  a  remarkable  example 
can  be  found  in  women  heterozygous  for  a  loss-of-function  variant  of  FOXP3  gene 
(Zuo  et  al.  2007).  This  gene,  encoding  a  transcription  factor  with  tumor  suppressor 
activity,  is  X-linked  in  both  humans  and  mice,  and  heterozygous  knockout  mice 
developed  mammary  tumors  at  a  high  rate.  When  clinical  samples  of  breast  cancer 
in  heterozygous  women  were  analyzed,  the  tumors  nearly  always  arose  from  cells 
that  had  inactivated  the  X  chromosome  with  the  functional  copy  of  the  gene 
resulting  in  a  functional  loss  of  FOXP3.  In  other  words,  different  epigenotypes 
(with  respect  to  the  activity  of  one  or  the  other  copy  of  FOXP3)  led  to  dramatically 
different  outcomes  in  cell  fate. 

Below  we  discuss  several  situations  in  which  one  might  want  to  be  aware  of 
MAE  while  dissecting  the  interaction  between  genotype  and  phenotype. 

Analysis  of  genetic  and  epigenetic  architecture  of  allele- specific  expression.  There 
are  several  potential  biological  sources  of  deviation  from  a  50:50  allelic  expression 
ratio:  a  genetic  effect  of  ds-regulatory  variation,  and  several  epigenetic/develop- 
mental  effects:  a  parent-of-origin  effect  (imprinting),  a  sampling  effect  of  clonality, 
and  primary  or  secondary  skewing  of  initial  establishment  of  the  allelic  choice. 
While  the  effects  of  those  on  X-linked  genes  have  been  discussed  (Chadwick  and 
Willard  2005;  Heard  et  al.  1997;  Wang  et  al.  2010),  the  clonality  effects  are  not 
routinely  taken  into  consideration  for  autosomal  genes.  Note  that  in  the  following 
discussion  of  the  clonality  effects  we  completely  leave  aside  any  questions  of 
technical  noise  or  artifactual  systematic  bias  introduced  during  measurement 
(such  as  preferential  amplification  or  detection  of  one  allele  over  the  other);  we 
focus  on  biological  factors  affecting  allelic  bias  as  it  "truly  is"  in  the  sample. 
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Collection  sampling  effect.  The  process  of  sampling  biological  material  can  lead  to 
overrepresentation  of  particular  clonal  lineage.  For  example,  lymphoblastoid  cell 
lines,  which  are  commonly  used  in  expression  quantitative  trait  loci  (eQTL)  studies 
of  multiple  individuals  (Dixon  et  al.  2007;  Lee  et  al.  2013),  can  be  monoclonal  or 
oligoclonal  in  20  %  of  cases  (Pastinen  et  al.  2004;  Plagnol  et  al.  2008).  For  MAE 
genes,  the  allelic  bias  in  these  samples  is  determined  by  the  choice  of  alleles  in  the 
dominant  clones  and  may  not  be  representative  of  the  cell  type  as  a  whole.  This  could 
be  particularly  salient  for  studies  designed  to  reliably  distinguish  ds-regulatory 
effects  from  trans  effects  by  measuring  allele- specific  expression  at  heterozygous 
loci  (Hull  et  al.  2007;  Pickrell  et  al.  2010). 

Even  samples  that  are  perfectly  polyclonal  and  representative  of  a  given 
individual  can  be  affected  by  the  sampling  effect  caused  by  the  small  number  of 
cells  at  the  moment  of  commitment  to  the  active  allele  choice.  Depending  on  the 
developmental  time  when  the  active  allele  is  chosen,  the  number  of  cells  involved 
might  be  fairly  small  [at  the  time  of  X  inactivation,  there  are  just  about  15  cells  that 
will  give  rise  to  the  entire  hematopoietic  system  (Tonon  et  al.  1998)],  resulting  in 
large  skewing  in  a  significant  fraction  of  individuals. 

Moreover,  an  observed  allelic  bias  in  polyclonal  samples  can  result  from  primary 
and  secondary  allelic  skewing.  Primary  skewing  signifies  the  idea  of  "unfair  coin" 
used  in  deciding  which  allele  will  be  active  in  a  given  lineage.  In  case  of  X 
inactivation  in  mice,  this  is  determined  by  which  variant  of  Xce  locus  is  present  on 
each  of  the  X  chromosomes  (Chadwick  et  al.  2006;  Chadwick  and  Willard  2005; 
Heard  et  al.  1997;  Plenge  et  al.  2000).  For  example,  in  Cast/Ei  129  Fl  cross,  only 
about  20  %  of  cells  will  have  X1  active  and  xCast  inactive.  There  is  evidence  that 
skewing  affects  MAE  genes,  at  least  in  the  mouse  cells:  for  multiple  genes,  one  allele 
is  much  more  likely  than  the  other  to  be  found  the  only  one  expressed  (Zwemer 
et  al.  2012).  While  such  primary  allelic  skewing  is  presumably  genetically  driven,  its 
consequences  differ  from  ds-acting  variation,  which  causes  the  uniform  allelic 
skewing  present  in  each  cell.  In  case  of  MAE,  average  allelic  imbalance  masks  a 
fraction  of  cells  that  may  be  biallelic,  or  skewed  in  the  other  direction.  Secondary 
clonal  skewing  can  result  from  preferential  survival  of  a  clone  (Plenge  et  al.  2002) 
that  could  be  related  to  its  functional  properties,  or  could  be  due  to  stochastic  events 
during  developmental  bottlenecks.  By  studying  mother-neonate  pairs  Bolduc 
et  al.  (2008)  show  that  X  inactivation  skewing  is  both  present  at  significant  rates  in 
humans  (8-28  %  dependent  on  age  and  tissue  type)  and  not  heritable,  suggesting 
secondary  skewing  as  the  dominant  cause. 

Analysis  of  total  expression  level  and  its  relationship  to  function.  Because  of  the 
propensity  of  MAE  genes  to  show  stably  biallelic  expression  in  a  significant 
fraction  of  cells,  and  biallelic  expression  typically  corresponding  to  higher  level 
of  transcript  than  monoallelic  expression,  the  clonal  composition  of  a  sample 
affects  the  overall  expression  level  of  the  gene,  apart  from  having  an  effect  on 
allelic  bias.  In  a  tissue,  the  average  level  of  expression  of  an  MAE  gene  is 
determined  by  its  RNA  levels  in  clones  with  biallelic,  paternal,  or  maternal 
monoallelic  expression,  as  well  as  the  fractional  representation  of  each  type  of 
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clone  in  the  general  population  of  cells.  Thus  the  expression  data  from  a  particular 
clonal  sample  is  not  necessarily  representative  of  the  cell  type  as  a  polyclonal 
whole,  and  vice  versa.  This  should  be  taken  into  account  when  systematically 
comparing  samples  of  different  clonality,  such  as  tumors  and  matched  normal 
tissue. 

This  has  bearing  on  the  important  question  of  identification  of  potential 
oncogenes.  Overexpression  in  tumor  samples  compared  to  normal  tissue  even 
without  evidence  for  DNA  amplification  is  considered  a  principal  form  of  evidence 
for  oncogene  identification  (Ko  et  al.  2003;  Santarius  et  al.  2010).  A  common 
strategy  for  uncovering  oncogenic  mutations  in  non-blood  cancers  is  to  obtain 
genotype  and  expression  data  from  tumors  and  matched  blood  samples  from  the 
same  patients  (Li  et  al.  2013),  or  large-scale  transcriptional  profiling  using  tumor 
samples  from  patients  and  normal  tissue  samples  from  healthy  individuals  (Gordon 
et  al.  2005).  Clonal  variability  of  expression  levels  of  MAE  genes  is  a  likely 
confounding  factor.  A  clonal  outgrowth  originating  from  a  cell  that  happened  to 
have  a  gene  in  question  biallelically  expressed  will  exhibit  overexpression  com- 
pared to  normal  tissue.  For  example,  if  30  %  of  the  cells  in  the  normal  tissue  have 
biallelic  expression  for  a  given  MAE  gene,  we  can  expect  that  approximately  30  % 
of  tumors  originating  in  cells  of  that  type  will  show  evidence  of  overexpression  in 
that  gene,  even  if  the  gene  in  question  has  no  effect  on  tumor  initiation  or 
progression. 

At  least  in  some  cases,  the  switch  between  monoallelic  and  biallelic  expression 
and  associated  change  in  effective  gene  dosage  do  play  a  functional  role — see  an 
example  with  Nanog  gene  during  differentiation  (Miyanari  and  Torres-Padilla 
2012).  Detailed  information  on  the  clonal  composition  and  primary  allelic  skewing 
in  the  normal  tissue  across  individuals  would  allow  testing  of  the  hypothesis  that 
such  composition  plays  a  role  in  disease  etiology. 

Analysis  of  nongenetic  variability.  MAE  is  likely  to  be  an  important  contributing 
mechanism  in  nongenetic  variability  which  plays  a  role  in  a  range  of  conditions. 
One  classic  case  of  somatic  functional  mosaicism  is  field  cancerization  (Slaughter 
et  al.  1953):  emergence  of  multiple  primary  tumors  in  a  limited  area  in  close 
proximity  to  each  other.  In  some  cases  this  phenomenon  can  be  ascribed  to  changes 
brought  by  somatic  mutations  [e.g.,  as  a  result  of  UV-damage  to  stromal  cells 
(Ratushny  et  al.  2012)].  Such  changes  can  also  be  epigenetically  driven 
(Hu  et  al.  2012).  Clonal  nature  of  MAE  makes  it  a  likely  candidate  mechanism 
for  such  local,  mosaic  effects.  Intriguingly,  a  recent  work  (Kreso  et  al.  2012) 
reported  that  epigenetic  clonal  heterogeneity  is  maintained  in  colon  cancer  in  a 
way  that  affects  response  to  chemotherapy  for  individual  clones  and  contributes  to 
development  of  chemotherapy  resistance.  MAE,  with  its  capacity  to  create  remark- 
able epigenetic  heterogeneity  and  maintain  it  over  multiple  cell  divisions,  is  a 
promising  candidate  for  the  underlying  mechanism.  This  also  holds  a  potential 
for  MAE-focused  epigenetic  treatments  to  prevent  tumor  initiation,  as  well  as 
reduce  drug  resistance  in  chemotherapy. 
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Haploinsufficiency .  A  final,  particularly  intriguing  relationship  is  the  one  between 
MAE  and  haploinsufficiency  in  those  cases  where  one  wild-type  allele  in  a  hetero- 
zygote  is  insufficient  for  normal  function.  Haploinsufficiency  has  been  reported  in 
neurodevelopmental  disorders  such  as  Williams  syndrome  (Meng  et  al.  1998)  and 
language  delays  (Lamb  et  al.  2012);  it  also  affects  a  significant  number  of  genes 
with  tumor  suppressor  activity  (Berger  and  Pandolfi  2011).  MAE  raises  the  possi- 
bility that  haploinsufficiency  comes  in  two  distinct  types.  In  the  first,  uniform  type, 
every  cell  in  affected  tissue  has  similar,  insufficient  level  of  expression  of  the 
non-MAE  gene  in  question.  In  the  second  type,  MAE  would  lead  to  nonuniformity 
in  cells  forming  the  affected  tissue,  with  some  cell  subpopulations  completely 
lacking  a  functional  copy  of  the  gene  and  other  cells  completely  normal.  The  latter 
cases  would  be  more  likely  to  result  in  variable  phenotype  due  to  variation  in  clonal 
composition  and  potential  rescue  through  allelic  skewing  [somewhat  reminiscent  of 
hemizygous  mutations  in  the  FMR  gene  leading  to  variable  phenotype  of  fragile  X 
syndrome  in  female  (Kirchgessner  et  al.  1995)].  The  MAE-based  type  of 
haploinsufficiency  may  be  amenable  to  a  different  set  of  therapeutic  strategies 
than  the  uniform  type,  such  as  finding  a  way  to  reverse  the  epigenetic  allelic  choice 
or  derepress  the  silent  allele. 


6.4  Conclusion 

Autosomal  MAE  is  a  widespread  epigenetic  phenomenon  that  affects  more  than 
10  %  of  human  and  mouse  genes.  Many  mechanistic  questions  related  to  MAE  are 
open.  A  major  technical  bottleneck  in  addressing  these  questions  is  the  clonal 
nature  of  MAE:  like  X  inactivation,  MAE  is  masked  in  polyclonal  samples. 
Depending  on  the  clonal  composition  of  a  biological  sample,  MAE  can  contribute 
to  apparent  allelic  expression  bias,  as  well  as  to  noticeable  differences  in  expression 
levels.  While  it  can  present  analytical  challenges  in  a  number  of  research  strategies 
and  personalized  medicine  approaches,  MAE  also  holds  promise  for  epigenetic 
manipulation  of  functional  properties  of  cells  and  cell  populations. 
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Part  II 

Epigenetic  Variation  in  Health 

and  Disease 


Chapter  7 

Recurrent  CNVs  in  the  Etiology  of  Epigenetic 
Neurodevelopmental  Disorders 


Janine  M.  LaSalle  and  Mohammad  Saharul  Islam 


Abstract  Recurrent  copy  number  variations  (CNVs)  are  structural  large  gains  and 
losses  of  chromosomal  domains  that  are  emerging  as  a  major  contributor  to  human 
neurodevelopmental  disorders,  such  as  intellectual  disability,  autism,  schizophre- 
nia, and  bipolar  disorders.  Among  the  most  commonly  causal  CNVs  observed  in 
neurodevelopmental  disorders  are  the  rearrangements  of  proximal  chromosome 
15q,  resulting  in  Angelman,  Prader-Willi,  and  15q  duplication  syndromes.  This 
locus  also  involves  multiple  epigenetic  layers  that  influence  parental  imprinting  in 
the  inheritance  of  these  disorders.  This  chapter  summarizes  the  known  CNVs 
associated  with  human  neurodevelopmental  disorders  and  discusses  how  epigenetic 
mechanisms  play  a  role  in  regulating  the  genes  implicated  in  these  loci.  Further- 
more, we  discuss  the  epigenetic  layer  of  DNA  methylation  and  its  potential  role  in 
breakpoint  instability  in  recent  primate  evolution. 


7.1  Introduction 


Chromosomal  rearrangements  include  several  different  classes  of  large  genetic 
changes,  such  as  deletions,  duplications,  inversions,  and  translocations.  Chromo- 
somal rearrangements  are  created  when  broken  DNA  double  helices  at  two  differ- 
ent locations  in  the  genome  are  joined  in  a  repair  attempt.  This  results  in 
chromosomal  gains  or  losses,  novel  structural  variations,  or  a  different  gene  order 
on  the  chromosome  (Griffiths  et  al.  1999).  Structural  variation  of  the  human 
genome  has  received  recent  attention  because  of  the  ability  to  sequence  across 
new  chromosomal  rearrangements  using  paired-end  next-generation  sequencing 
(Feuk  et  al.  2006;  Freeman  et  al.  2006).  Chromosome  rearrangements  have  been 
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implicated  in  Mendelian  disease  inheritance,  but  also  complex  diseases  and  benign 
variants  with  unknown  phenotypic  consequences  (Lupski  2006;  Lupski  and 
Stankiewicz  2005).  Interestingly,  many  recurrent  and  nonrecurrent  genomic 
rearrangements  result  in  copy  number  variation  (CNV)  of  critical  genes  involved 
in  the  pathogenesis  of  neurodevelopmental  disorders.  The  genetic  basis  of  disorders 
involving  neurocognitive  and  behavioral  phenotypes  such  as  autism  and  schizo- 
phrenia shows  remarkable  overlaps  in  some  of  the  recurrent  CNVs.  Many  of  the 
regions  implicated  in  neurocognitive  behavior  also  involve  epigenetic  regulation, 
such  as  imprinted  or  X  chromosome  inactivation,  or  involve  genes  involved  in 
epigenetic  pathways. 

In  this  chapter,  we  summarize  the  CNVs  and  implicated  genes  in  common  and 
complex  neurological  traits,  with  a  particular  focus  on  the  15q  disorders.  In 
addition,  we  discuss  how  epigenetic  mechanisms  are  involved  in  both  the  etiology 
of  the  chromosomal  rearrangements  as  well  as  their  variable  manifestation  in 
disease  phenotypes.  Lastly,  we  focus  on  the  evolutionary  forces  that  appear  to  be 
acting  on  a  particular  hot  spot  of  genomic  rearrangements,  the  15ql  1— ql3  locus 
implicated  in  multiple  human  neurodevelopmental  disorders. 


7.7.7  CNVs 

Genetic  rearrangements  that  alter  chromosomal  structure,  including  inversions  and 
some  translocations,  may  also  result  in  gene  copy  number  differences  (Hurles 
et  al.  2008).  Duplications,  deletions,  and  complex  multisite  variants  (Feuk 
et  al.  2006),  collectively  referred  to  as  CNVs,  are  frequently  found  in  humans  and 
other  mammals  (Feuk  et  al.  2006;  Freeman  et  al.  2006).  CNV  is  the  most  prevalent 
type  of  structural  variation  in  the  human  genome  (Redon  et  al.  2006).  As  much  as 
12  %  of  the  human  genome  and  thousands  of  genes  are  variable  in  copy  number, 
and  this  diversity  is  likely  to  be  responsible  for  a  significant  proportion  of  normal 
phenotypic  variation  (Carter  2007).  Annotated  CNVs  are  mostly  larger  (typically 
>50  kb)  and  intermediate-size  structural  variation  (>500  bp)  (Carter  2007; 
Pinto  et  al.  2007). 

CNVs  can  be  de  novo  or  familial,  with  the  former  more  likely  to  contribute  to  the 
development  of  sporadic  genomic  disorders  (McCarroll  et  al.  2008).  Duplication  or 
deletion  of  large  chromosomal  segments  can  disrupt  a  variable  number  of  genes, 
resulting  in  alternate  gene  products  or  changes  in  allelic  expression.  Moreover,  the 
disruption  of  distal  regulatory  regions  in  the  genome  created  by  CNVs  or  structural 
rearrangements  can  also  lead  to  altered  gene  expression.  Duplications  and  deletions 
of  genes  affecting  inflammatory  response,  immunity,  olfactory  function,  and  cell 
proliferation  might  have  been  fixed  by  positive  selection  and  involved  in  the 
adaptive  phenotypic  differentiation  of  humans,  mice,  and  chimpanzees  (Nguyen 
et  al.  2006;  Perry  et  al.  2008;  Schaschl  et  al.  2009;  Young  et  al.  2008).  This  diversity 
is  likely  to  be  responsible  for  a  significant  proportion  of  phenotypic  variation  in 
humans  (Carter  2007). 
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7.7.2    Neurodevelopmental  Disorders 

Neurodevelopmental  disorders  (NDs)  involve  the  abnormal  growth  and  development 
of  the  brain  or  central  nervous  system,  which  ultimately  affects  cognition  and/or 
behavior  throughout  life.  Neurodevelopmental  disorders  are  characterized  by 
neurological  and  psychiatric  signs  seen  in  the  course  of  brain  development — 
from  conception  to  early  adulthood  (Gray ton  et  al.  2012).  These  disorders  encom- 
pass a  variety  of  signs  and  symptoms  including  a  range  of  cognitive  impairments 
resulting  in  intellectual  disabilities  (ID)  that  are  often  first  characterized  as 
developmental  delay  (DD).  In  addition,  NDs  also  include  a  wide  range  of 
associated  behavioral  abnormalities  (such  as  hyperphagia  or  aggression),  psychotic 
symptoms,  autism  spectrum  disorder  (ASD),  sensory  impairment,  seizures,  motor 
dysfunction,  and  speech  and  language  difficulties  (Lee  and  Lupski  2006). 

The  chromosomal  rearrangements  observed  in  recurrent  CNVs  are  a  source  of 
interindividual  genetic  variation  that  could  explain  variable  penetrance  of  inherited 
(Mendelian  and  polygenic)  diseases  and  phenotypic  expression  of  sporadic  traits 
(Beckmann  et  al.  2007).  Table  7.1  summarizes  specific  CNVs  that  have  been 
identified  and  are  suspected  to  be  the  genomic  cause  of  several  neurodevelopmental 
disorders,  including  autism,  intellectual  disability,  epilepsy,  attention  deficit  hyper- 
activity disorder  (ADHD),  and  schizophrenia  (Helbig  et  al.  2009;  Mefford 
et  al.  2008;  Sebat  et  al.  2007;  Stefansson  et  al.  2008;  Walsh  et  al.  2008).  However, 
the  genetic  inheritance  is  complicated  by  both  variable  expressivity  and  incomplete 
penetrance  associated  with  specific  CNVs.  Variable  expressivity  means  that  the 
same  CNV  may  have  different  clinical  manifestations,  such  as  resulting  in  autism  or 
epilepsy  or  schizophrenia  in  different  individuals.  Incomplete  penetrance  refers  to 
the  inheritance  patterns  of  CNVs  that  are  found  in  affected  individuals  but  also  at  a 
lower  frequency  in  apparently  unaffected  controls.  Since  the  genetic  correlation 
between  specific  CNVs  and  specific  neurodevelopmental  disorders  is  not  absolute, 
epigenetic  mechanisms  are  important  to  consider  in  their  etiology. 


7.7.3    Epigenetic  Mechanisms  Influencing  CNVs 

Epigenetics  can  be  defined  as  modifications  to  nucleotides  or  chromosomes  that  do 
not  change  the  genetic  sequence,  but  can  affect  gene  expression  and  phenotypic 
outcome  by  a  variety  of  mechanisms.  DNA  methylation  refers  to  addition  of  methyl 
groups  to  CpG  dinucleotides  in  the  mammalian  genome.  DNA  methylation  within 
the  promoter  of  a  gene  is  almost  always  associated  with  repression,  while  methyla- 
tion levels  in  gene  bodies  and  non-genic  regions  are  actually  positively  associated 
with  expression  (Lister  et  al.  2009;  Rauch  et  al.  2009).  In  addition  to  DNA 
modification,  the  histone  core  proteins  H3  and  H4  are  both  heavily  posttransla- 
tionally  modified  on  their  N-terminal  tails  resulting  in  an  epigenetic  "histone  code" 
(Margueron  et  al.  2005;  Wang  et  al.  2004b).  Histone  marks  of  H3K4me3  and 
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H3K36me3  and  acetylation  of  H3  and  H4  at  multiple  sites  are  observed  at  active 
genes,  while  H3K27me3  and  H3K9me3  are  markers  of  a  repressed  chromatin  state 
(Wang  et  al.  2004b).  In  addition,  multiple  histone  modifications  and  variants 
influence  the  repair  of  double- stranded  breaks,  so  the  histone  state  can  influence 
susceptibility  of  the  genome  to  rearrangements  (Downs  et  al.  2007).  In  yeast, 
heterochromatin  is  less  accessible  to  DNA  repair  events  that  can  lead  to  chromo- 
somal rearrangements.  Furthermore,  genomic  regions  of  differential  low  methyla- 
tion  and  silent  gene  expression  were  highly  enriched  for  breakpoints  and  CNVs 
(Tang  et  al.  2012),  and  the  regions  of  the  genome  with  hypomethylation  show 
increases  in  structural  rearrangements  (Li  et  al.  2012).  Large-scale  domains  of 
partial  methylation  called  PMDs  have  been  observed  in  cancer  and  primary  embry- 
onic cell  lines  and  are  regions  of  heterochromatic  silent  marks  and  low  levels  of 
transcription  (Li  et  al.  2012;  Lister  et  al.  2009;  Schroeder  et  al.  2011).  Together,  the 
emerging  evidence  suggests  that  the  local  chromatin  environment  can  influence  the 
occurrence  of  structural  rearrangements  selectively  in  heterochromatin. 

Two  classic  examples  of  epigenetic  mechanisms  influencing  phenotype  are 
parental  imprinting  and  X  chromosome  inactivation.  Autosomal  regions  affected 
by  parental  imprinting  exhibit  allele-specific  differences  in  gene  expression  based 
on  parental  origin  (Reik  and  Walter  2001;  Morison  et  al.  2005).  Opposite  patterns 
of  DNA  methylation  and  histone  states  differentially  mark  parentally  imprinted 
regions  primarily  through  one  or  more  imprinting  control  regions  (ICR)  that 
determine  the  parental- specificity  of  gene  expression  in  these  loci  (Soejima  and 
Wagstaff  2005).  Similarly,  the  X  chromosome  in  females  is  subject  to  X  chromo- 
some inactivation  (XCI)  in  order  to  achieve  dosage  compensation  with  males 
(Payer  and  Lee  2008).  A  detailed  review  of  XCI  may  be  found  in  Chap.  2.  The 
main  difference  in  XCI  from  parental  imprinting  is  that  after  implantation  the 
choice  of  X  chromosome  is  random,  resulting  in  an  expected  50  %  maternally 
and  50  %  paternally  expressed  alleles.  However,  XCI  can  be  skewed  in  carriers  of 
X-linked  disease  genes  or  CNVs  (Robinson  et  al.  2001;  Knudsen  et  al.  2006).  Even 
in  women  who  are  not  obvious  carriers  for  disease-causing  variants,  skewed  XCI 
can  be  observed,  likely  because  of  stochastic  or  benign  genetic  variants 
(Hatakeyama  et  al.  2004).  Therefore,  CNVs  occurring  on  imprinted  loci  or  the  X 
chromosome  may  have  more  severe  effects  from  the  chromosomal  gain  or  loss 
because  only  one  allele  is  active. 

DNA  methylation  is  an  important  epigenetic  mechanism  not  only  in  the  pheno- 
typic  manifestation  of  CNVs  but  also  potentially  in  their  etiology.  The  human 
genome  is  the  most  highly  methylated  and  highly  repetitive  of  mammalian  genomes, 
and  the  high  levels  of  DNA  methylation  have  been  considered  to  be  a  genome 
defense  mechanism  against  the  spread  of  retrotransposons  and  low  copy  repeats 
(LCRs)  (Bestor  and  Tycko  1996).  Importantly,  global  DNA  hypomethylation  is 
associated  with  genome  instability,  as  observed  in  many  tumor  types  exhibiting 
global  DNA  hypomethylation  and  in  triplet  repeat  expansion  (LaSalle  2011).  Fur- 
thermore, the  1  %  of  the  human  genome  characterized  as  "methylation  deserts"  was 
enriched  for  susceptibility  to  CNVs  (Li  et  al.  2012).  Interestingly,  many  different 
environmental  exposures  such  as  heavy  metals,  air  pollutants,  and  persistent  organic 
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pollutants  are  not  known  mutagens,  but  are  associated  with  global  DNA 
hypomethylation  and  could  be  indirectly  predisposing  to  CNVs  (Baccarelli  and 
Bollati  2009).  In  support  of  the  possibility,  high  levels  of  the  organic  poly  chlorinated 
biphenol  PCB-95  and  reduced  DNA  methylation  were  observed  in  brain  samples 
from  individuals  with  maternally  derived  duplication  15q  syndrome  (Mitchell 
et  al.  2012b). 

Additional  epigenetic  mechanisms  with  potential  relevance  to  the  etiology  of 
CNVs  in  the  human  genome  are  the  structural  layers  of  chromatin  loops  and 
three-dimensional  nuclear  organization.  The  linear  chromosomal  maps  do  not 
necessarily  reflect  the  spatial  organization  of  genes  within  the  interphase  nucleus 
(Kosak  and  Groudine  2004).  The  recent  ENCODE  project  has  demonstrated  that 
long-range  interactions  between  distant  genes  on  the  same  or  the  different 
chromosomes  are  quite  common  in  the  human  genome  and  are  cell  type  depen- 
dent (Dunham  et  al.  2012;  Horike  et  al.  2005;  Meguro-Horike  et  al.  2011a;  Yasui 
et  al.  2011).  The  structural  CCTC-binding  factor  (CTCF)  and  the  methyl 
cytosine-binding  protein  2  (MeCP2)  are  both  nuclear  factors  with  genome-wide 
distribution  in  neurons  that  regulate  chromatin  looping  and  nuclear  organization 
(Kernohan  et  al.  2010).  Therefore,  existing  CNVs  may  alter  gene  expression 
beyond  the  breakpoint  boundaries  because  of  long-range  chromatin  effects.  In 
addition,  some  genetic  loci  distally  located  in  the  genome  may  be  more  suscepti- 
ble to  the  formation  of  CNVs  because  of  their  colocalization  in  the  interphase 
nucleus. 


7.2    CNVs  Associated  with  Neurodevelopmental  Disorders 
7.2.7    CNVs  in  Chromosome  1  and  Associated  ND 

Deletions  of  lp36  represent  the  most  common  syndrome  associated  with  terminal 
deletions  in  humans,  occurring  in  1  in  5,000  live  births  (Gajecka  et  al.  2010; 
Rosenfeld  et  al.  2010).  Different  deletion  sizes  are  associated  with  somewhat 
different  clinical  manifestations.  Features  in  common  to  all  lp36  monosomies 
include  ID,  DD,  hypotonia,  dysmorphic  facial  features,  and  microcephaly. 
Individuals  with  more  proximal  lp36.33  deletions  exhibit  a  phenotype  similar  to 
Prader-Willi  syndrome  (PWS)  with  hyperphagia  and  obesity  (D'Angelo  et  al.  2006, 
2009,  2010).  This  proximal  locus  includes  the  gene  encoding  v-ski  sarcoma  viral 
oncogene  homolog  (SKI),  a  transcriptional  regulator  required  for  the  maintenance 
of  the  neural  stem  cell  pool  and  the  development  of  the  corpus  callosum.  SKI 
appears  to  function  as  a  transcriptional  repressor  by  interactions  with  the 
chromatin-remodeling  factor  SATB  homeobox  2  (SATB2)  and  histone  deacetylase 
(Baranek  and  Atanasoski  2012). 

Deletions  at  1  q2 1 . 1  show  nominal  association  with  schizophrenia  (Stefansson 
et  al.  2008).  Previously  reported  1  q2 1 . 1  deletions  in  two  cases  of  ID  (De  Vries 
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et  al.  2005;  Sharp  et  al.  2006),  two  autistic  individuals  (Weiss  et  al.  2008),  and  one 
schizophrenia  case  (Walsh  et  al.  2008)  are  consistent  with  the  shorter  form  of  the 
deletion  identifying  the  causative  locus.  In  at  least  four  reports  (Brzustowicz 
et  al.  2000;  Gurling  et  al.  2001;  Hwu  et  al.  2003;  Zheng  et  al.  2006)  the  lq21 
locus  has  been  linked  to  schizophrenia.  The  1.35  Mb  deleted  segment  common  to 
both  the  large  and  the  small  form  of  the  1  q2 1 . 1  deletion  is  gene  rich  (Stefansson 
et  al.  2008),  containing  27  known  genes,  most  of  which  are  expressed  in  the  brain 
(ISC  2008),  and  previous  reports  have  shown  linkage  of  this  locus  to  ID  (Brunetti- 
Pierri  et  al.  2008).  The  gene  encoding  gap  junction  protein,  alpha  8  (GJA8),  is 
expressed  in  brain  and  located  in  a  repeat  region  within  the  boundary  of  the  1.35  Mb 
deletion  segment  and  previously  reported  as  associated  with  schizophrenia 
(Ni  et  al.  2007).  Recurrent  reciprocal  microdeletions  and  microduplications  within 
1  q2 1 . 1  represent  novel  genomic  disorders  consisting  of  microcephaly  or 
macrocephaly,  respectively,  and  can  manifest  with  a  range  of  developmental 
delay,  neuropsychiatric  abnormalities,  dysmorphic  features,  and  a  variety  of  other 
congenital  anomalies  (Brunetti-Pierri  et  al.  2008). 


7.2.2    CNVs  in  Chromosome  2  and  Associated  ND 

A  case  study  reported  that  a  child  with  developmental  delay,  unusual  autistic-like 
behaviors,  multiple  vertebral  anomalies,  and  an  unusual  facial  appearance  was 
found  to  have  a  de  novo  321  kb  deletion  of  chromosome  2pl6.3  by  array  compara- 
tive genomic  hybridization  (aCGH)  that  deleted  a  5f  portion  of  NRXN1,  encoding 
neurexin  1,  a  neuronal  cell  adhesion  molecule  (Zahir  et  al.  2008).  Another  report 
implicated  NRXN1  in  two  independent  subjects  with  ASD  and  a  balanced  chromo- 
somal rearrangement  at  2pl6.3  (Kim  et  al.  2008).  Furthermore,  NRXN1  missense 
variants  were  associated  with  autism  in  a  case-control  study  (Feng  et  al.  2006). 
Deletions  of  2pl6.3  that  disrupts  NRXN1  were  also  observed  in  schizophrenia 
(Kirov  et  al.  2008;  Rujescu  et  al.  2009).  A  115-kb  deletion  on  chromosome 
2pl6.3  disrupting  NRXN1  was  also  found  in  identical  twins  concordant  for 
childhood-onset  schizophrenia  (Walsh  et  al.  2008). 


7.2.3    CNVs  in  Chromosome  3q  and  Associated  ND 

Patients  with  autism  and  3q29  microdeletion  also  exhibited  ataxia,  chest-wall 
deformity,  and  long  and  tapering  fingers  (Willatt  et  al.  2005).  The  1.5  Mb  3q29 
microdeletion  encompasses  22  genes,  including  PAK2  and  DLG1,  which  are  auto- 
somal homologs  of  two  known  X-linked  mental  retardation  genes,  PAK3  and 
DLG3.  Another  study  found  six  3q29  microdeletions  among  7,545  schizophrenic 
subjects  compared  to  1  out  of  39,748  controls,  resulting  in  a  statistically  significant 
association  with  schizophrenia  (Mulle  et  al.  2010). 
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A  large  24  Mb  deletion  in  6q22.1-q23.2  was  reported  in  an  infant  with  DD, 
microcephaly,  facial  dysmorphism,  pulmonary  atresia,  and  ventricular  septal  defect 
(Rosenfeld  et  al.  2012).  More  recently,  seven  individuals  with  ND  refined  the 
deletion  locus  at  6q22.1  to  a  250  kb  cluster  of  four  genes,  including  MARCKS, 
HDAC2,  and  HS3ST5  which  are  implicated  in  neurodevelopment  (Rosenfeld 
et  al.  2012).  HDAC2  encodes  for  histone  deacetylase  2,  a  protein  important  in 
epigenetic  gene  repression. 


7.2.5    CNVs  in  Chromosome  7  and  Associated  ND 

Multiple  studies  have  shown  that  q3 1  region  on  chromosome  7  co-segregates  with 
speech  and  language  disorders.  Chromosomal  rearrangements  involving  7q31 
(including  translocations,  inversions,  and  a  duplication)  have  been  observed  in 
patients  with  autism  (IMGSAC  2001;  Ashley-Koch  et  al.  1999).  SPCH1  on 
human  7q31  was  associated  with  speech  and  language  disorder  (Lai  et  al.  2000). 
SPCH1  interval  overlaps  with  an  ~40-cM  region  identified  in  a  genome  screen  for 
susceptibility  to  autism,  a  disorder  which  is  often  associated  with  speech  and 
language  abnormalities  (Consortium  1998;  Fisher  et  al.  1998).  The  gene  encoding 
the  FOXP2  transcription  factor  is  located  in  the  same  locus  and  also  implicated  in 
the  speech  and  language  disorders.  A  minimal  deletion  in  mother  and  son  with 
FOXP2  haploinsufficiency  due  to  a  1.57-Mb  deletion  on  chromosome  7q31  (Rice 
et  al.  2012)  also  included  genes,  MDFIC  and  PPP1R3A.  The  boy  had  severe 
childhood  apraxia  of  speech,  with  poor  expressive  speech,  severely  delayed  speech 
acquisition,  and  inability  to  laugh,  sneeze,  or  cough  spontaneously.  He  showed 
mildly  impaired  cognition,  which  may  have  been  due  to  the  speech  limitations.  His 
24-year-old  mother  was  similarly,  if  slightly  less,  affected.  She  had  a  similar  early 
developmental  history,  with  speech  apraxia  and  mild  developmental  delay.  Neither 
patient  had  autistic  features.  Tissue-specific  methylation  differences  have  been 
observed  at  FOXP2  that  may  complicate  gene  association  studies  (Tolosa 
et  al.  2010). 

Several  genes  encoded  in  chromosome  7qll.23  are  dosage  sensitive  and  play  a 
role  in  human  language.  Deletion  of  chromosome  7ql  1.23  causes  Williams-Beuren 
syndrome  (WBS),  characterized  by  spatial  learning  deficits  and  aberrant  social 
behaviors  (Bozhenok  et  al.  2002;  Amir  et  al.  1999).  A  recurrent  microdeletion  of 
around  1.6  Mb  results  in  loss  of  28  genes  central  to  WBS,  including  CLIP2,  ELN, 
GTF2I,  GTF2IRD1,  and  LIMK1  (Pober  2010).  Loss  of  the  ELN  gene,  which  codes 
for  the  protein  elastin,  is  associated  with  the  connective-tissue  abnormalities  and 
cardiovascular  disease  (specifically  supravalvular  aortic  stenosis  and  supravalvular 
pulmonary  stenosis)  found  in  many  people  with  this  syndrome  (Curran  et  al.  1993; 
Johnson  et  al.  1976).  At  least  two  of  the  relatively  uncharacterized  genes  within  the 
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WBS  deletion  have  predicted  epigenetic  functions,  including  WBSCR22  that 
encodes  a  putative  methyltransf erase  and  WBSCR9  that  encodes  a  transcriptional 
regulator  (Doll  and  Grzeschik  2001;  Merla  et  al.  2002;  Peoples  et  al.  1998). 


7.2.6    CNVs  in  Chromosome  16  and  Associated  ND 

Recurrent  rearrangements  of  the  16pl  1.2  locus  have  been  observed  exhibiting  both 
reduced  penetrance  and  variable  expressivity.  Both  microdeletions  and 
microduplications  of  chromosome  16pl  1.2  have  been  implicated  in  autism 
(Weiss  et  al.  2008),  from  a  study  within  the  Autism  Genetic  Resource  Exchange 
(AGRE)  population  that  identified  five  instances  of  de  novo  deletions  on  chromo- 
some 16pl  1.2  as  well  as  reciprocal  duplication  of  the  593  kb  deleted  region. 
However,  these  microdeletions  and  duplications  are  also  observed  in  around  1  % 
of  controls  (IMGSAC  2001;  de  Kovel  et  al.  2010;  McCarthy  et  al.  2009;  Shinawi 
et  al.  2010;  Weiss  et  al.  2008).  This  deleted  or  duplicated  region  on  16pll.2 
contains  25  annotated  genes.  Recently  some  reports  also  showed  that  the 
microduplications  of  16pll.2  are  associated  with  schizophrenia  and  ADHD 
(McCarthy  et  al.  2009;  Shinawi  et  al.  2010).  The  gene  KCTD13,  encoding  poly- 
merase delta-interacting  protein  1  and  predicted  to  be  involved  in  neurogenesis,  was 
identified  as  the  sole  contributor  to  the  microcephaly  phenotype  of  16qll.2  CNVs 
in  a  screen  of  overexpression  in  zebrafish  embryos  (Golzio  et  al.  2012). 


7.2.7    CNVs  in  Chromosome  17  and  Associated  ND 

Microdeletion  of  17pl3.3  causes  Miller-Dieker  lissencephaly  syndrome  (MDLS; 
OMIM  #  247200).  The  MDLS  deleted  region  includes  the  lissencephaly- 1  (LIS1) 
gene,  PAFAH1B1  (Dobyns  et  al.  1993).  MDLS  is  characterized  by  the  brain 
malformation  associated  with  deletion  of  LIS1  (PAFAH1B1),  and  includes  abnor- 
mal facial  appearance  and  severe  ID.  Usually  MDLS  patients  have  large  deletion 
intervals  (more  than  1.3  Mb),  which  show  a  more  severe  grade  of  lissencephaly, 
likely  due  to  the  inclusion  of  particular  genes  other  than  PAFAH1B1  in  the  deletion 
interval  (Cardoso  et  al.  2003). 

Microdeletions  of  3.7  Mb  on  17pl  1.2  cause  Smith-Magenis  syndrome  (SMS; 
OMIM  #182290),  a  developmental  disorder  (Chen  et  al.  1997;  Greenberg 
et  al.  1991;  Shaw  et  al.  2004).  SMS  is  characterized  by  multiple  congenital 
anomalies,  mental  retardation,  a  variable  degree  of  developmental  delay,  behav- 
ioral and  physical  abnormalities  such  as  hearing  impairment,  and  minor  skeletal 
and  craniofacial  defects  (Greenberg  et  al.  1991,  1996).  Although  this  region 
contains  multiple  genes,  the  loss  of  the  retinoic  acid  induced  1  or  RAI1  is  responsi- 
ble for  most  of  the  characteristic  features  of  this  condition  (Girirajan  et  al.  2006). 
Also,  other  genes  within  the  chromosome  17  contribute  to  the  variability  and 
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severity  of  the  clinical  features.  The  loss  of  other  genes  in  the  deleted  region  may 
help  explain  why  the  features  of  SMS  vary  among  affected  individuals. 

CNVs  are  also  observed  in  the  ql  1 .2  region  on  chromosome  17.  1.5  Mb  recurrent 
interstitial  microdeletions,  which  include  the  NF1  tumor- suppressor  gene,  are  found 
in  5-20  %  of  patients  with  autosomal  dominant  neurofibromatosis  type  1  (NF1; 
OMIM  #162200)  (Cnossen  et  al.  1997;  Rasmussen  et  al.  1998).  This  recurrent 
contiguous  gene  deletion  (type  I)  encompasses  at  least  13  genes  other  than  NF1, 
which  occurs  with  rearrangement  hot  spots  contained  within  the  NF1REP-P1  and 
NF1REP-M  LCRs  (Forbes  et  al.  2004;  Lopez-Correa  et  al.  2001).  NF1  is  primarily 
characterized  by  multiple  benign  nerve- sheath  tumors  or  neurofibromas  and  pig- 
mentary changes  (Korf  2002).  NF1  encodes  a  transcription  factor  that  binds  to 
epigenetically  dysregulated  targets  in  human  cancers  (Nischal  et  al.  2012). 


7.2.8    CNVs  in  Chromosome  22  and  Associated  ND 

A  microdeletion  of  22qll.2  causes  the  velo-cardiofacial  syndrome  (VCFS;  OMIM 
#192430),  which  is  also  known  as  DiGeorge  syndrome  (DGS;  OMIM  #188400). 
VCFS  is  associated  with  developmental  delay  and  a  wide  range  of  cognitive  and 
neurological  deficits,  which  include  speech,  language,  memory,  and  attention 
(Bearden  et  al.  2001;  El  Tahir  et  al.  2004;  Lynch  et  al.  1995;  Moss  et  al.  1999). 
The  3  Mb  deletion  region  causing  VCFS  contains  the  gene  encoding  catechol-O- 
methyl  transferase  (COMT),  an  enzyme  responsible  for  the  degradation  of  dopa- 
mine. There  are  many  other  genes  thought  to  be  involved  in  brain  function  and 
neurodevelopment,  including  TBX1,  GNB1L,  ZDDHC8,  DGCR8,  RANBP1,  and 
CDC45L  (Meechan  et  al.  2009;  Paylor  et  al.  2006;  Williams  et  al.  2008).  Some 
VCFS  individuals  with  22qll.2  deletion  were  also  found  to  have  schizophrenia 
(Murphy  and  Owen  2001),  suggesting  that  there  could  be  a  link  between  the  other 
genes  of  this  region  and  psychiatric  disorders.  Interestingly,  DGCR8  encodes  a 
miRNA  biogenesis  protein  that  regulates  miR-185,  which  in  turn  targets  the  DNA 
methyltransf erase  gene  DNMT1,  resulting  in  global  hypomethylation  in  human 
glioma  (Zhang  et  al.  2011;  Earls  et  al.  2012). 


7.2.9    CNVs  in  Chromosome  X  and  Associated  ND 

The  X  chromosome  has  both  a  genetic  and  epigenetic  connection  to  neurodeve- 
lopmental  disorders,  resulting  in  differences  in  disease  penetrance  in  males  versus 
females.  Duplications  of  chromosome  Xq28  are  observed  in  males  with  ID, 
autism,  and  immune  abnormalities,  and  the  primary  gene  implicated  is  MECP2 
(Kirk  et  al.  2009;  Lugtenberg  et  al.  2009;  Prescott  et  al.  2009;  Van  Esch 
et  al.  2005b;  Velinov  et  al.  2009).  While  point  mutations  in  MECP2  are  the 
most  frequent  cause  of  Rett  syndrome  (RTT)  (Amir  et  al.  1999b),  a  rare  genomic 
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deletion  of  the  MECP2  gene  was  observed  (Ravn  et  al.  2005).  RTT  (Amir 
et  al.  1999b)  is  a  neurodevelopmental  disorder,  which  mostly  affects  females. 
Girls  with  classical  RTT  exhibit  an  apparently  normal  development  of  6-18 
months  of  age,  followed  by  a  regressive  stage  characterized  by  deceleration  of 
head  growth  and  loss  of  speech  and  acquired  motor  skills.  Male  cases  of  RTT  are 
very  rare,  limited  to  exceptional  cases  of  somatic  mosaicism  or  X  chromosome 
aneuploidy  (Moog  et  al.  2003).  Five  critical  genes  were  found  within  the  200  kb 
minimal  duplication  region,  including  LI  CAM  and  MECP2  (del  Gaudio 
et  al.  2006;  Meins  et  al.  2005;  Van  Esch  et  al.  2005a).  The  phenotypic  severity 
of  Xq28  duplications  depended  on  dosage  of  MECP2  more  than  duplication  size. 
MECP2  is  subject  to  X  chromosome  inactivation  in  females  and  also  encodes  an 
epigenetic  factor  that  binds  to  methylated  DNA. 


7.3    15q  Chromosomal  Rearrangement  and  Related 
Neurodevelopmental  Disorder 

15ql  1— ql3  is  a  notable  chromosome  region,  because  it  is  both  highly  epigenetically 
regulated  and  characterized  by  multiple  structural  abnormalities  like  deletions, 
duplications,  triplications,  and  translocations  (Hogart  et  al.  2008).  Chromosome 
15  is  also  enriched  in  segmental  LCRs  (Bailey  et  al.  2002)  and  LCR-mediated 
misalignment  to  unequal  nonallelic  homologous  recombination,  which  generates  a 
series  of  common  breakpoints  (BPs)  along  the  15ql  1.2— ql3  (Christian  et  al.  1999). 
There  are  three  BP  clusters  located  in  proximal  15q  that  correspond  to  complex 
LCRs  within  50-500  kb  (Fig.  7.1)  (Amos-Landgraf  et  al.  1999;  Makoff  and  Flomen 
2007).  The  repeats  at  BP1-BP3  show  limited  sequence  homology  to  each  other,  but 
the  two  more  distal  BP  clusters  (BP4  and  BP5)  involved  a  distinct  set  of  LCRs 
(Makoff  and  Flomen  2007).  This  complex  structure  of  tandem  and  inverted  LCRs 
on  proximal  15q  contributes  to  a  variety  of  recurrent  and  more  complex  deletions 
and  duplications  observed  in  neurodevelopmental  disorders  (Hogart  et  al.  2010).  In 
addition  to  the  complexity  of  the  genetic  backbone  of  chromosome  15qll-ql3,  the 
center  of  this  locus  is  also  subject  to  parental  imprinting,  an  epigenetic  phenomenon 
associated  with  parental  allele- specific  differences  in  gene  expression,  DNA  meth- 
ylation,  and  chromatin  organization. 

The  imprinted  gene  expression  observed  in  15qll-ql3  has  important 
implications  for  the  inheritance  patterns  of  the  human  diseases  associated  with 
this  locus.  Three  different  neurodevelopmental  disorders  map  to  this  region,  includ- 
ing PWS,  Angelman  syndrome  (AS),  and  15q  duplication  syndrome  (de  Kovel 
et  al.  2010;  Doornbos  et  al.  2009;  Helbig  et  al.  2009;  Miller  et  al.  2009;  Murthy 
et  al.  2007;  Sharp  et  al.  2008;  Stefansson  et  al.  2008).  Large  deletions  around  5  Mb 
are  responsible  for  approximately  70  %  of  cases  of  PWS  and  AS.  These 
LCR-mediated  deletions  occur  during  meiosis  at  breakpoints  1-3  (BP1-BP3, 
Fig.  7.1)  (Amos-Landgraf  et  al.  1999;  Christian  et  al.  1999).  PWS  results  from 
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Fig.  7.1  Schematic  map  of  chromosome  15ql  1.1-14  showing  the  position  of  some  known  genes, 
which  are  involved  with  neurodevelopmental  disorders  due  to  chromosomal  rearrangement.  This 
image  has  been  taken  from  UCSC  Genome  Browser  (http://genome.ucsc.edu).  The  green  bars 
indicating  the  five  breakpoints  (BP  1-5)  and  Prader-Willi  and  Angelman  imprinting  control 
regions  (ICR)  (Hogart  et  al.  2008) 


paternal  inheritance  of  15ql  1— ql3  deletions,  while  AS  is  caused  by  maternal 
inheritance  of  the  same  genetic  deletion  (Hogart  et  al.  2008).  As  a  reciprocal 
event,  15q  duplication  syndrome  occurs  either  at  BP2-3,  similar  to  AS  and  PWS, 
or  at  similar  repetitive  sequences  at  two  other  breakpoints,  BP4-BP5  (Fig.  7.1)  by 
unequal  but  homologous  recombination  (Wang  et  al.  2004a). 

15ql  1— ql3  duplication  has  been  estimated  at  1-3  %  of  individuals  with  ASD 
which  is  one  of  the  most  frequent  cytogenetic  abnormalities  (Cook  et  al.  1997; 
Moeschler  et  al.  2002).  The  inheritance  of  this  duplication  shows  a  parent-of-origin 
effect,  as  paternal  inheritance  results  in  lack  of  an  overt  phenotype.  Recently  one 
study  showed  that  the  frequency  of  the  gains  of  15ql  1.2— ql3  (BP2-BP3)  in  ASD 
populations  is  close  to  1  in  500  (Moreno-De-Luca  et  al.  2012),  whereas  previously 
it  has  been  reported  as  1  %.  This  could  be  because  of  severe  phenotype  conferred  by 
these  gains  (Moreno-De-Luca  et  al.  2012). 


7.4    Genetics  and  Epigenetics  in  the  15q  Deletion 
and  Duplication  Disorders 

7.4.1    Angelman  Syndrome 

Angelman  syndrome  (OMIM  105830)  is  a  neurogenetic  disorder  caused  by  loss  of 
function  of  the  maternally  inherited  allele  of  UBE3A  (Kishino  et  al.  1997).  While 
other  genes  within  the  large  deletion  may  modulate  the  severity  of  AS,  patients  with 
maternally  inherited  mutations  of  UBE3A  have  defined  this  single  gene  as  the  major 
contributor  to  AS.  Furthermore,  UBE3A  shows  imprinted  expression  in  selected 
postnatal  neuronal  populations,  with  exclusive  expression  from  the  maternal  allele, 
thus  explaining  the  maternal  specific  origin  of  AS  (Kishino  et  al.  1997;  Matsuura 
et  al.  1997).  But  the  actual  genetic  and  epigenetic  causes  of  AS  are  diverse  (Lossie 
et  al.  2001).  Around  70  %  of  AS  cases  are  caused  by  maternal  deletion  of 
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15qll-ql3  (Knoll  et  al.  1989),  -10  %  by  maternal  UBE3A  mutations  (Fang 
et  al.  1999),  -10  by  paternal  uniparental  disomy  (UPD)  of  chromosome 
15  (Malcolm  et  al.  1991),  and  the  remainder  by  microdeletions  or  epigenetic 
mutations  causing  a  paternal  methylation  pattern  on  the  maternally  inherited 
chromosomes,  collectively  called  imprinting  mutations  (Buiting  et  al.  2003; 
Glenn  et  al.  1993;  Reis  et  al.  1994). 

Angelman  syndrome  patients  exhibit  ataxia,  microcephaly,  frequent  seizures, 
and  profound  learning  disabilities  coupled  with  short  attention  span,  absent  speech, 
and  characteristic  happy  demeanor  (Cassidy  et  al.  2000;  Lossie  et  al.  2001). 
AS  language  skills  are  severely  impaired,  remaining  at  less  than  or  equivalent  of 
2  years  of  age  throughout  life  (Andersen  et  al.  2001;  Penner  et  al.  1993).  The 
AS  critical  region  is  35  kb  telomeric  to  the  PWS  critical  region  (Lalande  and 
Calciano  2007a). 

The  AS  gene,  UBE3A,  encodes  a  member  of  a  class  of  functionally  related  E3 
ubiquitin-protein  ligases  and  is  expressed  from  both  parental  alleles  in  most  tissues 
(Huibregtse  et  al.  1995)  except  the  brain,  where  UBE3A  is  exclusively  expressed 
from  the  maternally  inherited  allele  (Rougeulle  et  al.  1998).  So,  specifically  in  the 
postnatal  brain,  mutation  or  deletion  of  maternal  UBE3A  results  in  a  complete  loss 
of  the  ubiquitin  ligase  activity  (Yamasaki  et  al.  2003b).  The  mechanism  of  UBE3A 
imprinting  is  complex  and  involves  multiple  epigenetic  layers.  First,  a  maternally 
inherited  imprinting  center  (AS -ICR,  Fig.  7.1)  is  required  for  methylation  and 
silencing  of  the  PWS-ICR  on  the  maternal  allele  in  early  development  (Perk 
et  al.  2002;  Johnstone  et  al.  2005).  Differential  histone  modifications  and  binding 
of  MeCP2  are  also  observed  at  the  PWS-ICR  that  distinguish  maternal  and  paternal 
alleles  (Thatcher  et  al.  2005;  Fulmer-Smentek  and  Francke  2001;  Gregory 
et  al.  2001;  Makedonski  et  al.  2005).  The  second  requirement  for  UBE3A  imprint- 
ing is  for  the  neuronal  transcription  of  an  extremely  long  transcript  originating  from 
the  paternal  PWS-ICR  through  SNRPN,  multiple  noncoding  RNA  clusters,  and 
ending  with  an  antisense  transcript  of  UBE3A  (UBE3A-AS)  (Rougeulle  et  al.  1998; 
Yamasaki  et  al.  2003a;  Lalande  and  Calciano  2007b).  While  all  cell  types  express 
the  protein  coding  transcript  for  SNRPN,  only  postnatal  neurons  transcribe 
completely  through  to  the  UBE3A-AS  in  order  to  turn  off  the  paternal  allele  of 
UBE3A.  Interestingly,  a  recent  screen  for  small  molecules  that  could  reestablish 
paternal  Ube3a  expression  in  mouse  neurons  identified  several  inhibitors  of 
topoisomerases,  enzymes  that  function  to  change  the  topology  of  DNA  structures 
(Huang  et  al.  2012). 


7.4.2    Prader-Willi  Syndrome 

PWS  (OMIM  176270)  is  primarily  a  neurodevelopmental  disorder,  but  it  also 
affects  multiple  organs  and  metabolism.  The  opposite  of  AS,  PWS  is  caused  by 
the  loss  of  paternal  15qll-ql3.  Most  of  the  PWS  patients  have  a  large  deletion  of 
paternal  15qll-ql3,  while  other  patients  have  either  maternal  UPD  or  an 
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imprinting  defect  which  lacks  the  presence  of  the  PWS-ICR  (Chamberlain  and 
Lalande  2010).  The  typical  symptoms  of  PWS  are  hypotonia  and  failure  to  thrive 
seen  in  infant,  followed  by  hyperphagia,  leading  to  morbid  obesity  in  childhood. 
Other  common  symptoms  of  PWS  include  sleep  abnormalities,  small  stature 
with  small  hands  and  feet,  and  obsessive-compulsive  disorder  (Cassidy  and 
Driscoll  2009). 

While  there  are  multiple  paternally  expressed  protein-coding  genes  in  the  PWS 
locus,  including  MKRN3,  MAGEL2,  NDN,  and  SNURF-SNRPN,  the  most  recent 
evidence  from  both  human  and  mouse  genetics  has  pointed  to  locus  containing  on 
noncoding  repeated  units  of  the  C/D  box  small  nucleolar  RNA  (snoRNA)  genes 
(Hogart  et  al.  2010).  PWS  patients  lacking  the  SNORD116  (HBII-85)  snoRNA- 
encoding  locus  suffer  from  the  same  failure  to  thrive,  hypotonia,  and  hyperphagia 
that  are  observed  in  patients  with  larger  deletions  and  maternal  UPD  (De  Smith 
et  al.  2009;  Sahoo  et  al.  2008;  Duker  et  al.  201 1).  Two  different  mouse  models,  both 
with  deletions  specifically  encompassing  Snordll6  not  the  adjacent  Snordll5 
cluster,  have  demonstrated  a  recognizable  phenotype  of  reduced  growth  and  altered 
metabolism  that  characterize  the  infant-stage  PWS  phenotype  (Skryabin  et  al.  2007; 
Ding  et  al.  2008).  While  some  earlier  studies  suggested  a  role  for  the  SNORD115 
cluster  in  PWS  (Doe  et  al.  2009;  Kishore  and  Stamm  2006),  paternal  deletion  of  this 
locus  does  not  appear  sufficient  to  cause  PWS  (Ding  et  al.  2005). 

Similar  to  the  epigenetic  mechanisms  for  AS,  the  epigenetic  mechanisms 
regulating  and  involving  the  Snordll6  locus  are  decidedly  complex.  Snordll6 
transcription  is  controlled  from  the  PWS-ICR,  which  is  methylated  and  silenced 
on  the  maternal  allele  in  all  tissues.  In  neurons,  transcription  progresses  beyond 
Snprn,  through  Snordll6  and  Snordll5,  eventually  reaching  the  Ube3a-as.  Inter- 
estingly, there  is  emerging  evidence  that  the  different  noncoding  RNA  encoded 
within  Snordll5  and  Snordll6  may  act  in  dual  roles  to  open  chromatin  structure 
and  mediate  nucleolar  maturation  during  neural  development  (Leung  et  al.  2009; 
Vitali  et  al.  2010). 


7.4.3    15q  Duplication  Syndrome 

15q  duplication  syndrome  is  a  clinically  identifiable  syndrome  that  occurs  in  two 
subtypes  based  on  the  type  of  chromosomal  rearrangement.  Isodicentric  chromo- 
some 15  (idic(15))  refers  to  a  supernumerary  chromosome  (Battaglia  2005). 
Isodicentric  chromosomes  can  occur  via  U-type  crossover  events  in  meiosis 
(Robinson  et  al.  1998),  which  form  a  supernumerary  derivative  chromosome 
15  with  two  centromeres  (Hogart  et  al.  2010).  Interstitial  duplications  of  proximal 
15ql  1.2-13  occur  without  changing  the  chromosomal  number,  and  are  the  recipro- 
cal rearrangement  as  observed  as  deletions  in  PWS  and  AS,  through  duplication  of 
BP1-BP3  or  BP2-BP3  (Repetto  et  al.  1998).  Some  interstitial  duplications  and  idic 
(15)  rearrangements  are  observed  with  distal  LCR  contribution,  extending  to  BP4 
and  BP5  (Wandstrat  et  al.  1998). 
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15q  duplications  have  emerged  as  the  single  most  common  causative  CNV 
found  in  autism  at  1-3  %  (Cook  et  al.  1998;  Schroer  et  al.  1998).  In  addition  to 
autism,  individuals  with  15q  duplication  syndrome  exhibit  hypotonia  at  birth,  motor 
skills  and  language  development  delay,  disabilities  in  cognitive  and  learning,  and 
epilepsy  (Chamberlain  and  Lalande  2010).  Some  individuals  also  present  with 
anxiety,  hyperactivity,  and  short  stature  (Battaglia  2005).  Individuals  with  idic 
(15)  are  more  severe  than  interstitial  duplication  patients,  because  they  have  three 
copies  of  maternal  15qll-ql3,  whereas  interstitial  duplication  patients  have  two 
copies  (Battaglia  2008). 

One  report  showed  that  a  boy  with  autism,  epilepsy,  ataxia,  and  an  interstitial 
duplication  of  15q  had  the  duplication  of  the  GABRA5  and  GABRB3  genes  (Bundey 
et  al.  1994).  Researchers  also  showed  that  15ql3.1  duplication  spanning  APBA2  in 
schizophrenia  (SZ)  patient  (Kirov  et  al.  2008).  A  common  chromosome  rearrange- 
ment is  that  an  inverted  duplication  occurs  tetrasomy  for  15ql  l-ql3,  which  causes 
more  severe  phenotype  (Battaglia  2008). 

Interestingly,  the  clinical  variability  of  15q  duplication  syndrome  is  less  related 
to  differences  in  genetic  breakpoints  than  in  epigenetic  or  stochastic  factors.  Two 
studies  comparing  brain  expression  levels  suggest  that  most  15q-encoded  genes  do 
not  act  according  to  imprint  and  copy  number,  with  the  exception  being  UBE3A  that 
was  increased  in  all  brain  samples  (Hogart  et  al.  2009;  Scoles  et  al.  2011).  The 
paternal  transcript  SNRPN  and  the  biallelically  expressed  GABRB3  were  actually 
decreased  compared  to  controls,  an  unexpected  result  based  on  predicted  imprinting 
patters  and  copy  number.  A  recent  study  also  showed  reduced  transcript  levels  of 
NDN,  SNRPN,  GABRB3,  and  CHRNA7  in  15q  duplication  neuronal  model  that 
recapitulated  the  altered  transcription  levels  observed  in  postmortem  brain 
(Meguro-Horike  et  al.  2011b). 


7.5    Evolutionary  Considerations  of  CNVs 
and  Neurodevelopment 

Both  common  and  low  copy  genetic  repeats  are  the  major  risk  factors  for  recurrent 
CNVs  that  arise  in  ND,  as  transposons,  simple  sequence  repeats,  processed 
pseudogenes,  and  tandemly  repeated  sequences  together  make  up  over  a  third  of 
the  human  genome  (Zepeda-Mendoza  et  al.  2010).  Intriguingly,  the  primate  lineage 
has  quite  recently  evolved  the  unique  group  of  LCRs  (also  called  segmental  dups) 
implicated  in  the  recurrent  CNV  rearrangements.  Primate-specific  LCRs  are  unique 
because  their  breakpoints  are  especially  CpG  dense  due  to  their  enrichment  in  the 
Alu  class  of  repeats  (Conrad  et  al.  2010). 

In  a  comparison  of  human  chromosomal  rearrangements  between  human  and 
chimpanzee  (Fig.  7.2),  only  about  half  of  the  human  chromosomes  show  structural 
divergence  from  chimpanzee.  Chromosome  15ql  1— ql3  emerges  as  the  major  hot 
spot  for  chromosomal  structural  differences  by  this  analysis.  The  divergent 
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sequences  on  15qll-ql3  correspond  to  the  major  breakpoint  regions  discussed  in 
the  previous  section  as  being  the  sites  of  rearrangement  leading  to  the 
15q-associated  NDs.  The  recent  evolutionary  changes  in  human  proximal  15q 
may  also  begin  to  explain  why  the  various  mouse  models  of  15q-associated  NDs 
do  not  fully  recapitulate  the  phenotypes,  despite  a  strong  conservation  of  the 
protein-coding  gene  sequences  and  imprinting  patterns. 

Since  25  %  of  the  total  DNA  methylation  in  human  cells  is  at  Alu  sequences  (Xie 
et  al.  2009),  the  epigenetic  mechanism  of  DNA  methylation  is  predicted  to  be  a 
major  factor  in  predisposing  A/w-rich  LCRs  to  CNV  rearrangements  in  recent 
evolution.  In  support  of  an  epigenetic  role  in  primate  LCR  origins,  the  hybrid 
primate  species  of  white-cheeked  gibbons  are  hypomethylated  at  Alu  sites  com- 
pared to  other  primates,  correlating  with  increased  structural  variants  with 
breakpoints  corresponding  to  the  lowest  methylated  A lu  sites  (Carbone  et  al.  2009). 

Therefore,  it  perhaps  is  not  surprising  that  LCRs  and  the  CNVs  that  arise  from 
these  A/w-rich  repeats  may  play  an  important  role  in  primate  evolution  and  species 
differences  in  levels  of  brain-expressed  transcripts.  The  transcriptome  of  human 
brain  shows  many  brain- specific  transcript-level  differences  compared  to  other 
primates  (Caceres  et  al.  2003;  Enard  et  al.  2002).  The  regions  with  the  biggest 
differences  in  gene  neighborhood  and  brain  expression  between  human  and  chim- 
panzee (De  et  al.  2009)  are  also  the  human  chromosomal  regions  significantly 
enriched  for  LCRs  (Jiang  et  al.  2007),  including  15qll-ql3. 


7.5.7    Combinations  of  CNVs  and  Total  CNV  Burden  in  ND 

In  addition  to  their  recent  evolutionary  differences,  LCR  repeat  blocks  are  also 
quite  polymorphic  between  individual  human  genomes.  Approximately  3  %  of 
assignable  duplications  are  predicted  to  be  unique  to  an  individual,  even  though 
the  largest  LCR  duplication  blocks  are  invariant  (Alkan  et  al.  2009).  Since  CNVs 
are  found  at  some  level  in  all  human  genomes,  the  assignment  of  specific  CNVs  to 
specific  neurodevelopmental  disorders  is  often  problematic.  More  recently, 
combinations  of  CNVs  have  been  explored  in  order  to  investigate  the  whole- 
genome  burden  of  CNVs  in  individuals  with  NDs.  Whole  chromosomal  differences 
may  contribute  to  hypomethylation  leading  to  further  rearrangements,  so  it  is 
important  to  investigate  both  environmental  contributors  to  DNA  hypomethylation 
as  well  as  combinations  of  CNVs  in  the  future. 

One  recent  study  examined  the  total  burden  of  CNV  gains  and  losses  in 
individuals  with  a  range  of  ND,  including  autism,  ID,  or  dyslexia,  compared  to 
controls  and  found  that  the  largest  CNV  burden  correlated  with  the  severity  of 
the  disorder  (Girirajan  et  al.  2011).  These  types  of  studies  will  be  important  in  the 
future  to  improve  the  understanding  of  the  link  between  CNVs  and  ND  beyond 
the  syndromic  forms.  The  other  question  that  investigations  of  total  CNV  burden 
raises  is  what  environmental  exposures  may  predispose  to  CNVs  and  the  DNA 
hypomethylation  associated  with  increased  rearrangements  of  LCRs.  In  our  recent 
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analysis  of  persistent  organic  pollutant  exposures  in  postmortem  brain  samples  of 
ND,  we  observed  an  unexpected  association  between  PCB-95  exposures  and 
chromosome  15q  duplication  syndrome  and  DNA  hypomethylation  (Mitchell 
et  al.  2012a).  Future  studies  will  be  needed  to  determine  the  causality  of  specific 
environmental  exposures  with  increased  recurrent  CNV  occurrences  in  human  cells 
and  in  transgenerational  studies. 
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Chapter  8 

Impact  of  the  Early-Life  Environment 
on  the  Epigenome  and  Behavioral 
Development 


Benoit  Labonte  and  Gustavo  Turecki 


Abstract  The  environment  in  which  we  live,  and  especially  the  early-life 
environment,  regulates  our  behavioral  development.  Adversity  during  early  life  is 
strongly  associated  with  problems  in  behavioral  regulation  and  psychopathology  in 
adulthood.  Until  recently,  the  mechanisms  responsible  for  behavioral  changes 
induced  by  early-life  adversity  were  not  clear.  However,  recent  evidence  suggests 
that  early-life  environment  induces  behavioral  changes  through  epigenetic 
mechanisms  controlling  the  expression  of  genes  involved  in  the  regulation  of 
behavior.  Thus,  the  epigenome  mediates  the  effects  of  environmental  variability 
on  behavioral,  physiological,  and  pathological  responses  increasing  vulnerability 
toward  suicidal  behaviors.  Numerous  findings  in  animals  and  humans  support  this 
view.  This  chapter  reviews  the  evidence  suggesting  that  epigenetic  changes  are 
induced  by  the  early  environment  and  impact  the  regulation  of  gene  expression  in 
the  brain  increasing  the  risk  for  suicidal  behaviors. 


8.1    The  Burden  of  Early-Life  Adversity  and  Suicide 


Children  in  our  society  are  all  too  often  subjected  to  maltreatment,  which  is 
frequently  perpetrated  by  their  parents  or  caregivers.  Accordingly,  childhood  mal- 
treatment is  a  global  problem  of  significant  proportion  that  affects  children  of  all 
ages,  race,  economic,  and  cultural  backgrounds  (Children 'sBureau  2010).  There  are 
four  main  types  of  childhood  maltreatment;  these  are  sexual  abuse,  physical  abuse, 
psychological  abuse,  as  well  as  parental  neglect  (Gilbert  et  al.  2009).  With  more 
than  three  million  reports  of  child  maltreatment  in  the  USA  in  2009  and  similar 
statistics  elsewhere  in  Western  societies  (Children 'sBureau  2010),  early-life 
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adversity  represents  one  of  the  major  risk  factors  associated  with  higher  prevalence 
of  suicidal  behaviors  (Evans  et  al.  2005;  Santa  Mina  and  Gallop  1998).  From  an 
epidemiological  point  of  view,  trauma  exposure  in  children  is  estimated  to  range 
between  25  and  45  %,  although  the  rates  reported  vary  considerably  between 
studies  and  according  to  the  definition  of  abuse  types  (Scher  et  al.  2004;  Briere 
and  Elliott  2003;  Gorey  and  Leslie  1997;  McCauley  et  al.  1997;  Heim  et  al.  2010). 

The  economic  burden  of  child  maltreatment  and  trauma  resides  mainly  in  its 
impact  on  the  development  of  psychopathology  later  during  adulthood.  Indeed, 
child  trauma,  and  particularly  child  sexual  and  physical  abuse  (CSA  and  CPA),  is 
associated  with  increased  risks  of  psychiatric  disorders  including  depression,  anxi- 
ety, bipolar  disorder,  substance  abuse,  and  suicide  (Evans  et  al.  2005;  Molnar 
et  al.  2001;  Santa  Mina  and  Gallop  1998;  Heim  and  Nemeroff  2001; 
Kendler  et  al.  2000,  2004;  Kaplan  and  Klinetob  2000;  Agid  et  al.  1999;  Fergusson 
et  al.  1996).  In  addition  to  increasing  the  risk  of  psychiatric  disorders,  CSA  and 
CPA  also  associate  with  earlier  age  of  onset  of  psychopathology,  chronic  course, 
more  severe  outcomes,  poorer  recovery  rates,  and  more  importantly,  with  a 
12  times  higher  odds  of  suicidal  behaviors  (Dinwiddie  et  al.  2000;  Gladstone 
et  al.  2004;  Jaffee  et  al.  2002;  Zlotnick  et  al.  2001;  Brown  and  Moran  1994; 
Tanskanen  et  al.  2004;  Bensley  et  al.  1999;  Molnar  et  al.  2001). 


8.2    Epigenetic  Consequences  of  Early-Life 
Adversity  on  the  Brain 

Early-life  adversity  is  frequently  associated  with  maladaptive  patterns  of  behavioral 
responses  often  leading  to  pervasive  interpersonal  difficulties,  enhanced  reactivity 
to  stress,  and  increased  risk  of  psychopathology.  While  substantial  theoretical  and 
empirical  work  supports  the  relationship  between  childhood  adversity  and  develop- 
ment of  negative  mental  health  outcomes  in  adulthood,  the  critical  question  has 
been  what  molecular  processes  mediate  these  associations.  In  other  words:  "What 
long-lasting  molecular  mechanisms  take  place  as  a  result  of  the  adverse  life 
experience  that  could  be  associated  with  increased  risk  for  psychopathology?" 
Despite  the  complexity  of  this  question,  this  chapter  reviews  the  evidence 
suggesting  that  molecular  alterations  result  from  variation  in  early-life  environment 
through  epigenetic  processes  that  modulate  behaviors  in  animal  models  and 
increase  the  risk  for  suicide  in  humans. 

Although  the  same  DNA  is  found  in  every  cell  of  our  body,  cells  differentiate 
into  specific  cell  types  and  synthesize  different  proteins,  allowing  them  to  evolve 
and  adapt  to  specific  environments.  This  whole  process  is  believed  to  involve 
epigenetic  mechanisms. 

Epigenetics  refers  to  the  study  of  the  epigenome:  chemical  modifications  taking 
place  in  or  around  the  DNA  molecule  and  altering  the  capacity  of  a  gene  to  be 
activated  and  to  produce  the  mRNA  it  encodes.  There  are  several  epigenetic 
mechanisms,  including  histone  modifications  (Kouzarides  2007),  DNA  methylation 
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(Klose  and  Bird  2006),  and  hydroxymethylation  (Kriaucionis  and  Heintz  2009). 
Another  mechanism,  which  is  not  a  priori  an  epigenetic  mechanism  but  is  often 
classified  as  such,  is  the  posttranslational  regulation  of  gene  expression  via 
microRNA  (He  and  Hannon  2004).  Given  the  high  complexity  of  DNA  organiza- 
tion, these  modifications  are  expected  to  follow  defined  patterns  allowing  the 
underlying  molecular  mechanisms  to  be  performed  correctly  and  to  decode  DNA 
in  the  context  of  chromatin. 

Within  the  cells'  nuclei,  DNA  is  packaged  into  a  structure  called  chromatin, 
composed  of  nucleosomes,  the  fundamental  unit  of  chromatin,  around  which  146  bp 
of  DNA  is  wrapped.  The  nucleosome  itself  is  formed  by  an  octamer  of  the  four  core 
histones  (H2A,  H2B,  H3,  and  H4),  globular  structures  with  a  tail  of  amino  acids 
which  can  be  modified  by  the  addition  or  the  removal  of  chemical  residues. 
Chromatin  has  two  structures:  euchromatin,  the  active  state  associated  with  gene 
transcription,  and  its  counterpart,  heterochromatin,  the  inactive  state  corresponding 
to  gene  repression. 

The  chromatin  state  is  dynamically  regulated  by  the  recruitment  of  proteins 
carrying  intrinsic  enzymatic  activity  to  histone  modifications  (Clements  et  al.  2003; 
Fischle  et  al.  2005;  Nelson  et  al.  2006;  Pray-Grant  et  al.  2005;  Santos-Rosa 
et  al.  2003)  which  induce  the  opening  or  the  closing  of  chromatin  (Wysocka 
et  al.  2005,  2006).  Up  to  eight  types  of  histone  modifications  have  been 
characterized  (methylation  (lysine,  arginine),  acetylation,  phosphorylation, 
ubiquitylation,  sumoylation,  deimination,  ADP  ribosylation,  and  proline  isomeri- 
zation),  although  they  might  not  be  all  found  in  eukaryotic  cells  (Kouzarides  2007). 
While  these  marks  may  contribute  concomitantly  to  regulate  gene  expression,  most 
attention  has  been  focused  on  lysine  methylation  and  acetylation.  For  instance, 
methylation  at  specific  lysines  (K)  of  the  third  histone  such  as  the  fourth  (H;  H3K4), 
36th  (H3K36),  and  H3K79  (Kirmizis  et  al.  2007;  Salcedo-Amaya  et  al.  2009; 
Barrera  et  al.  2008;  Pokholok  et  al.  2005;  Wang  et  al.  2008;  Xiao  et  al.  2007)  has 
been  associated  with  active  transcription  while  methylation  at  H3K9,  H3K27,  and 
H4K20  often  correlates  with  transcriptional  repression  (Wang  et  al.  2008;  Barski 
et  al.  2007;  Bannister  et  al.  2001;  Botuyan  et  al.  2006;  Lan  et  al.  2007;  Nielsen 
et  al.  2001;  Sanders  et  al.  2004;  Swigut  and  Wysocka  2007).  It  is  likely  that  each  of 
these  modifications  has  a  distinct  signature  profile  which  may  overlap  to  form  a 
histone  code  controlling  the  structure  of  the  chromatin,  as  well  as  gene  transcrip- 
tion, according  to  cell  needs.  However,  this  code  is  still  far  from  being  cracked  or 
understood. 

DNA  methylation  is  a  posttranscriptional  modification  that  refers  mainly  to  the 
transfer  of  a  methyl  group  (CH3)  from  an  S-adenosyl-L-methionine  (AdoMet)  donor 
to  the  5f  carbon  of  the  cytosine  from  dinucleotide  CpG  sequences.  This  process 
requires  the  enzymatic  activity  of  DNA  methyltransferase  (DNMT)  proteins  among 
which  DNMT3a  and  DNMT3b  are  called  de  novo  methylases  because  they  intro- 
duce methyl  groups  at  previously  unmethylated  cytosines  (Hata  et  al.  2002;  Okano 
et  al.  1998).  In  contrast,  DNA  hydroxymethylation  refers  to  the  oxidation  of 
preexisting  5f  methylcytosine  to  5f  hydroxymethylation  by  enzymes  by  the  TET 
family  (Tahiliani  et  al.  2009;  Ito  et  al.  2010). 
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Fig.  8.1  The  postulated  relationships  between  early-life  adversity,  epigenetic  regulation,  and 
psychopathology 


In  somatic  cells,  approximately  80  %  of  CpGs  are  methylated  (Tucker  2001). 
The  remaining  unmethylated  ones  are  grouped  in  CpG-enriched  regions  called  CpG 
islands,  often  found  in  the  5f  regulatory  regions  of  genes.  Thus,  DNA  methylation  in 
gene  promoters  has  classically  been  associated  with  translational  repression  by 
interfering  with  the  recruitment  and  the  binding  of  the  transcriptional  machinery  to 
gene  regulatory  regions  (Klose  and  Bird  2006).  However,  when  DNA  methylation 
is  found  within  the  gene  body,  it  has  been  associated  with  transcriptional  activation 
and  alternative  transcript  selection  (Maunakea  et  al.  2010).  In  contrast,  recent 
findings  suggest  that  hydroxymethylation  is  enriched  in  the  gene  body  of  active 
genes  (Mellen  et  al.  2012).  Moreover,  different  cell  types  exhibit  distinct  methyla- 
tion and  hydroxymethylation  patterns  that  confer  a  specificity  of  expression  based 
on  the  requirements  of  each  cell  type  (Mellen  et  al.  2012;  Iwamoto  et  al.  2011). 

In  the  following  sections,  we  review  findings  in  gene  systems  that  have  been 
targeted  by  studies  investigating  epigenetic  factors  associated  with  the  social 
environment,  specifically  in  the  context  of  suicide.  We  review  animal  studies 
using  models  of  early-life  environmental  variation,  and  studies  focusing  on  suicide 
emphasizing  those  investigating  the  effect  of  early-life  adversity.  Overall  we 
review  the  evidence  suggesting  that  epigenetic  mechanisms  may  be  involved  in 
the  modification  of  gene  expression  induced  by  environmental  factors.  While  most 
individuals  who  die  by  suicide  do  not  have  a  history  of  abuse  during  childhood,  a 
significant  minority  does,  and  in  this  subgroup,  the  association  is  very  strong. 
Among  the  systems  reviewed  in  this  chapter,  we  focus  in  particular  on  gene  systems 
coding  for  components  of  the  hypothalamus-pituitary-adrenal  (HPA)  axis  and 
related  signaling  hormones  and  molecules,  as  well  as  neurotrophic  factors,  their 
receptors,  and  neurotransmitters.  Figure  8.1  displays  the  postulated  relationships 
between  early-life  adversity,  epigenetic  regulation,  and  psychopathology. 
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8.3    HPA  Axis  Alterations  Induced  by  Early-Life  Adversity 

Child  abuse  has  been  proposed  to  induce  its  long-term  behavioral  consequences 
partly  by  altering  the  neural  circuits  involved  in  the  regulation  of  stress  (Heim 
et  al.  2008b).  The  HPA  axis  is  the  main  stress  regulatory  system  (Pariante  and 
Lightman  2008).  Under  stressful  conditions,  corticotropin-releasing  factor  (CRF) 
and  vasopressin  (A  VP)  are  released  from  the  hypothalamus.  CRF  and  A  VP  induce 
the  release  of  adreno  corticotropic  hormone  (ACTH)  and  pro-opiomelanocortin 
(POMC)  from  the  pituitary  gland  to  the  blood  which  then  travels  to  the  adrenal 
cortex  where  they  induce  the  release  of  glucocorticoids — Cortisol  in  humans  and 
corticosterone  in  rodents — to  the  blood.  Glucocorticoids  then  act  at  each  level  of 
the  HPA  axis  to  decrease  the  release  of  CRF,  AVP,  POMC,  and  ACTH  and  regulate 
the  stress  response.  While  the  HPA  axis  can  be  regulated  at  different  levels,  the 
main  locus  of  regulation  lies  in  the  hippocampus  where  glucocorticoids  bind 
glucocorticoid  receptors  (GR)  and  induce  an  inhibitory  feedback  on  the  activation 
of  the  HPA  axis  to  bring  the  activity  of  the  stress  response  back  to  basal  levels. 

From  a  structural  point  of  view,  childhood  abuse  and  neglect  have  been 
associated  with  volume  loss  in  the  hippocampus  (Bremner  et  al.  1997;  Stein 
et  al.  1997;  Driessen  et  al.  2000),  altered  cortical  symmetry  in  the  frontal  lobe 
(Carrion  et  al.  2001)  and  superior  temporal  gyrus  (de  Bellis  et  al.  2002),  as  well  as  a 
reduced  neuronal  density  and/or  neuronal  integrity  in  the  anterior  cingulated  gyrus 
(de  Bellis  et  al.  2002).  One  study  also  reported  poorer  hippocampal  activation  on  a 
memory  task  in  patients  with  a  history  of  childhood  abuse  (Bremner  et  al.  2003). 
Importantly,  the  structural  consequences  of  child  abuse  are  thought  to  be  time 
dependent,  implying  that  particular  brain  regions  may  have  unique  windows  of 
vulnerability  to  the  effects  of  child  abuse  (Andersen  et  al.  2008). 

From  a  molecular  point  of  view,  depressed  patients  with  a  history  of  child  abuse 
have  been  reported  to  exhibit  higher  ACTH  and  Cortisol  levels  following  stress  and 
dexamethasone  (DEX)  challenges  (Heim  et  al.  2000,  2008a).  Interestingly,  in  these 
studies,  both  ACTH  and  Cortisol  levels  did  not  differ  significantly  between 
depressed  subjects  without  a  history  of  childhood  abuse  and  controls  (Heim 
et  al.  2000,  2008a).  Childhood  abuse,  and  particularly  physical  abuse,  has  also 
been  shown  to  increase  corticotropin-releasing  hormone  (CRH)  levels  (Heim 
et  al.  2008b;  Carpenter  et  al.  2004)  and  to  decrease  cerebral  spinal  fluid  oxytocin 
levels  (Heim  et  al.  2009).  More  recently,  low  hippocampal  GR  levels  have  been 
reported  in  suicide  completers  with  a  history  of  childhood  abuse  (McGowan 
et  al.  2009;  Labonte  et  al.  2012b)  but  not  in  non-abused  suicide  completers. 
Altogether,  these  alterations  are  believed  to  lead  to  important  behavioral  changes 
that  may  increase  the  predisposition  toward  suicidal  behavior  later  in  life. 

This  work  is  also  substantiated  with  findings  from  animal  work.  For  instance,  the 
development  of  the  HPA  axis  has  been  shown  to  be  modulated  by  maternal 
behavior  in  rats.  Depressive-like  behaviors  (Francis  et  al.  1999)  associated  with 
altered  HPA  axis  feedback  (Liu  et  al.  1997)  and  low  GR  mRNA  hippocampal  levels 
(Liu  et  al.  1997)  are  common  features  in  rats  raised  by  mothers  providing  maternal 
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care  defined  by  low  levels  of  licking  and  grooming  (LG).  A  comprehensive  model 
involving  modulation  at  numerous  levels,  including  hormonal,  synaptic,  and  molec- 
ular changes,  has  been  proposed  in  an  attempt  to  characterize  the  molecular 
pathways  involved  in  the  modulatory  effects  of  maternal  behavior  in  rats.  High 
maternal  LG  levels,  which  can  also  be  mimicked  by  handling  pups  during  early  life, 
induce  a  physiological  response  involving  the  release  of  thyroid  hormone  plasma 
levels.  This  increases  5-HT  activity  in  the  raphe  nuclei,  and  consequently  stimulates 
serotonin  (5-HT)  turnover  in  the  hippocampus  and  frontal  cortex  (Mitchell 
et  al.  1990;  Smythe  et  al.  1994;  Meaney  et  al.  1987).  Via  activation  of  the 
G-protein-coupled  5-HT7  receptor  (Laplante  et  al.  2002),  it  is  believed  that  5-HT 
activates  a  c-AMP/PKA-dependant  intracellular  cascade  increasing  the  expression 
of  nerve  growth  factor  1-A  (NGFI-A)  and  activator  protein-2  (AP-2)  in  the  hippo- 
campus (Meaney  et  al.  2000).  NGFI-A  and  AP-2  are  activating  transcription  factors 
with  putative  binding  sites  within  the  GR  promoter  region  (McCormick  et  al.  2000) 
that  increase  GR  mRNA  levels  in  the  hippocampus  of  the  offspring.  This  complex 
process  is  attenuated  in  rats  raised  by  low  LG  mothers  according  to  the  molecular 
and  behavioral  processes  mentioned  previously  and  resulting  in  relatively  lower  GR 
expression  in  the  hippocampus  (Weaver  et  al.  2004;  McGowan  et  al.  2011).  Inter- 
estingly, most  of  these  regulatory  changes  are  temporally  stable  and  are  maintained 
throughout  adulthood.  Moreover,  cross-fostering  studies  report  that  these  behav- 
ioral and  molecular  modifications  are  reversed  when  pups  raised  by  low  LG 
mothers  are  transferred  to  high  LG  mothers  (Liu  et  al.  1997;  Francis  et  al.  1999) 
within  the  first  week  of  life. 


8.4    The  Glucocorticoid  Receptor  Gene 


As  the  HPA  programming  by  maternal  behavior  is  modified  by  cross-fostering  and 
temporally  stable,  researchers  hypothesized  that  the  long-term  effects  of  maternal 
behavior  and  early-life  environment  variation  on  GR  hippocampal  expression  could 
be  due  to  epigenetic  modifications.  In  rats,  the  GR  gene  is  preceded  by  10  noncoding 
exons  and  by  14  in  humans  (McCormick  et  al.  2000;  Turner  and  Muller  2005).  The 
expression  of  the  noncoding  exon  17  in  rats  and  the  human  homologue  1F  have  been 
shown  to  be  specific  to  the  hippocampus  (Turner  and  Muller  2005).  Each  of  the 
untranslated  exon  1  variants  has  its  own  promoter  and  multiple  transcription  factor- 
binding  sites,  including  NGFI-A  (Meaney  2001),  have  been  identified  in  GR 
promoter  sequences  (Turner  et  al.  2008,  2010).  In  offspring  raised  by  low  LG  rat 
mothers,  CpG  methylation  levels  in  the  exon  17  promoter  region  are  significantly 
increased  at  almost  all  CpGs  compared  to  offspring  raised  by  high  LG  mothers. 
More  importantly,  one  CpG  located  in  the  5f  end  of  a  NGFI-A-binding  site  is 
methylated  in  almost  100  %  of  offspring  raised  by  low  LG  mothers  whereas  it  is 
almost  not  methylated  in  offspring  from  high  LG  mothers  (Weaver  et  al.  2004). 
Interestingly,  follow-up  studies  in  high  and  low  LG  rats  showed  that  DNA  methyl- 
ation levels  are  also  increased  in  the  promoters  of  other  noncoding  first  exons  of  the 
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GR  gene  which  were  associated  with  transcriptional  changes  (McGowan 
et  al.  2011)  suggesting  that  the  whole  GR  locus  may  be  poised  for  epigenetic 
regulation  by  environmental  social  factors. 

These  findings  were  recently  translated  to  humans  through  studies  investigating 
hippocampal  tissue  from  individuals  who  died  by  suicide  with  and  without  a  history 
of  childhood  adversity,  as  well  as  normal  controls  (McGowan  et  al.  2009;  Labonte 
et  al.  2012b).  Notably,  methylation  levels  in  the  exon  1F  promoter  in  abused  suicide 
completers  were  significantly  higher  than  among  non-abused  suicides  and  healthy 
controls.  In  addition,  similarly  to  what  was  found  in  rats,  a  significant 
hypermethylation  in  an  NGFI-A-binding  site  was  found  in  abused  suicide 
completers  but  not  in  the  other  groups.  Through  a  series  of  cell  functional  assays, 
this  epigenetic  mark  was  shown  to  repress  the  binding  of  NGFI-A  to  its  cognate 
DNA  sequence  and  to  decrease  GR  transcription  (McGowan  et  al.  2009). 

It  is  also  interesting  to  note  that  these  findings  have  been  supported  by  other 
groups  with  different  populations  of  individuals  that  suffered  from  early-life  adver- 
sity. Indeed,  higher  levels  of  methylation  in  the  promoter  of  GR  1F  have  been 
reported  in  the  infants  of  mothers  reporting  intimate  partner  violence  during  their 
pregnancy  compared  to  those  born  from  normal  mothers  (Radtke  et  al.  2011). 
Another  study  reported  significant  correlations  between  GR  1F  promoter  methyla- 
tion levels  and  parental  loss,  child  maltreatment,  and  parental  care  (Tyrka 
et  al.  2012).  Furthermore,  DNA  methylation  levels  in  GR  1F  promoter  were 
shown  to  be  positively  correlated  with  childhood  sexual  abuse,  its  severity,  and 
the  number  of  maltreatment  types  in  individuals  with  MDD,  and  with  repetition  of 
severe  types  of  abuse  in  patients  with  bipolar  disorders  (Perroud  et  al.  2011). 
Altogether,  this  suggests  that  early-life  adversity  may  induce  specific  long-lasting 
epigenetic  alterations  affecting  gene  expression. 

In  a  different  study  assessing  the  expression  of  several  GR  exon  1  variants 
expressed  in  the  limbic  system  of  depressed  suicide  completers,  GR1F  and  GR1C 
hippocampal  expression  were  significantly  decreased  in  depressed  suicide 
completers  (Alt  et  al.  2010).  However,  this  was  not  associated  with  promoter 
hypermethylation  although  it  should  be  noted  that  this  study  investigated  methyla- 
tion only  in  a  limited  region  and  promoter  methylation  levels  reported  were 
particularly  low.  On  the  other  hand,  NGFIA  protein  levels  in  the  HPC  were 
significantly  decreased  in  depressed  suicide  completers  suggesting  that  the  decrease 
in  GR  expression  found  in  suicide  completers  may  be  mediated  by  different 
molecular  pathways  depending  on  the  presence  or  the  absence  of  early-life 
adversity. 

More  recently,  our  group  pushed  further  the  investigation  of  early-life  adversity 
consequences  on  the  epigenetic  regulation  of  GR  in  the  HPC  of  abused  suicide 
completers.  Our  data  indicated  that  the  expression  of  the  noncoding  exons  1B,  lo 
and  1H  is  significantly  decreased  in  suicide  completers  with  a  history  of  childhood 
abuse  compared  to  non-abused  suicides  and  controls.  The  assessment  of  methyla- 
tion levels  in  the  promoter  of  GR1C  revealed  methylation  differences  that  are 
inversely  correlated  with  GR1C  expression  in  accordance  with  our  previous  finding 
on  1F  variant.  On  the  other  hand,  the  GR1H  promoter  showed  site-specific 
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hypomethylation  that  was  positively  correlated  with  hGRlH  expression.  In  other 
words,  lower  levels  of  methylation  were  significantly  correlated  with  lower  expres- 
sion, suggesting  that  active  demethylation  is  also  a  functional  mechanism  that  may 
be  affected  by  early-life  adversity.  While  this  is  a  mechanism  that  has  received  less 
attention,  more  work  is  required  in  order  to  elucidate  its  potential  implications  in 
the  context  of  early-life  adversity. 

In  addition  to  DNA  methylation,  chromatin  changes  have  also  been  associated 
with  less  frequent  maternal  stimulation.  For  instance,  H3K9  acetylation,  a  marker 
of  open  euchromatin  state  (Kouzarides  2007),  was  found  to  be  lower  in  the  GR17 
promoter  in  low  LG  raised  rats.  Pharmacological  challenge  with  the  histone 
deacetylase  inhibitor  (TSA)  restored  methylation  levels,  increased  NGFI-A  binding 
to  the  promoter,  and  reinstated  H3K9  acetylation  and  GR  hippocampal  levels 
(Weaver  et  al.  2004).  Treated  rats  were  also  less  reactive  to  stressful  conditions. 
By  decreasing  H3K9  acetylation,  DNA  access  to  the  transcriptional  machinery  and 
DNA-binding  proteins  such  as  transcription  factors  and  methylated  DNA-binding 
proteins  is  reduced.  Functionally,  these  results  suggest  that  variation  in  the  early- 
life  environment  in  rats  and  early-life  adversity  in  human  induces  a  coordinated 
remodeling  of  epigenetic  mechanisms  involving  DNA  methylation  and  chromatin 
modifications  in  the  multiple  promoters  of  GR  leading  to  important  changes  in  GR 
expression,  and  consequent  regulation  of  the  HPA  axis. 


8.5    The  Vasopressin  (A VP)  and  Corticotropin-Releasing 
Factor  Genes 

Other  components  of  the  HPA  axis  have  also  been  shown  to  be  affected  by  early- 
life  stress.  For  instance,  early-life  infant-maternal  separation  in  mice,  inducing 
stress-coping  behavioral  alterations  in  pups,  has  been  shown  to  be  associated  with 
a  long-lasting  increase  in  corticosterone  secretion  and  with  an  increased  expression 
of  POMC  and  AVP  in  the  paraventricular  nucleus  (PVN)  of  the  hypothalamus 
(Murgatroyd  et  al.  2009).  The  AVP  gene  in  mice  is  composed  of  three  coding  exons 
and  is  oriented  tail  to  tail  with  the  oxytocin  (Oxt)  gene.  Interestingly,  the  intergenic 
region  between  the  AVP  and  the  Oxt  genes  has  been  shown  to  include  an  enhancer 
modulating  AVP  expression  (Gainer  et  al.  2001)  and  is  itself  composed  of  a  CpG 
island  (Murgatroyd  et  al.  2009). 

Methylation  at  multiple  sites  within  the  AVP  enhancer  was  shown  to  be 
decreased  in  the  PVN  of  stressed  mice  6  weeks,  3  months,  and  1  year  following 
the  stress  regimen  (Murgatroyd  et  al.  2009).  Consistent  with  the  repressive  role  of 
DNA  methylation  on  expression,  this  was  associated  with  overexpression  of  AVP 
gene.  The  regulatory  properties  of  this  enhancer  were  defined  by  a  deletion  experi- 
ment. Deleting  the  first  part  of  the  enhancer  partially  reduced  transcriptional 
activity,  while  removing  the  entire  enhancer  almost  completely  abolished  the 
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gene's  activity.  Furthermore,  methylation  of  the  enhancer  also  significantly  reduced 
transcriptional  activity.  Interestingly,  AVP  expression  was  also  significantly 
increased  in  stressed  mice  at  10  days,  although  no  methylation  differences  were 
observed  in  the  AVP's  enhancer.  The  AVP  enhancer  can  putatively  bind  the 
methylated  CpG-binding  protein  MeCP2.  However,  because  of  the  repressive  role 
of  MeCP2  on  transcription,  one  would  expect  the  opposite  tendency  concerning 
AVP  expression.  MeCP2  has  nevertheless  been  shown  to  be  susceptible  to  inacti- 
vation  by  neuronal  depolarization-induced  phosphorylation,  leading  to  its  dissocia- 
tion from  putative  targets  (Chen  et  al.  2003;  Zhou  et  al.  2006).  Accordingly,  higher 
neuronal  activity-induced  CaMKII  immunoreactivity  and  phosphorylated  MeCP2 
levels  have  been  reported  in  AVP-expressing  neurons  in  the  PVN  of  10-day-old 
stress  mice.  Altogether,  these  results  suggest  that  in  young  stressed  mice,  methyla- 
tion patterns  in  AVP's  enhancer  allow  the  binding  of  MeCP2,  which  could  then 
repress  expression.  However,  since  early-life  stress  also  increases  neuronal  activity 
in  AVP-expressing  neurons,  MeCP2  gets  phosphorylated  and  inactivated.  Conse- 
quently, the  repressive  effect  of  MeCP2  on  AVP  expression  is  abolished.  On  the 
other  hand,  methylation  levels  in  AVP  enhancer  decrease  with  time.  This  may 
decrease  MeCP2  binding  and  allow  AVP  to  be  expressed  at  higher  levels.  Overall, 
these  results  nicely  suggest  that  alterations  in  DNA  methylation  found  outside  of 
the  promoter  might  also  be  involved  in  physiological  and  behavioral  modifications 
induced  by  environmental  factors. 

The  regulation  of  a  related  peptide,  CRF,  has  been  recently  shown  to  be  also 
associated  with  epigenetic  regulation  by  the  social  environment.  Accordingly,  CRF 
expression  in  the  PVN  of  chronically  socially  defeated  mice  was  found  to  be 
increased  (Elliott  et  al.  2010).  Interestingly,  this  effect  was  found  only  in  animals 
susceptible  to  social  stress  and  showing  the  normal  subordinated  behavior  follow- 
ing chronic  exposure  to  aggressive  littermates  as  opposed  to  the  resilient  mice 
continuing  to  interact  with  their  aggressor.  This  was  associated  with  lower  levels  of 
methylation  as  reported  by  a  reduced  number  of  methylated  clones  in  the  suscepti- 
ble group  compared  to  control  and  resilient  mice.  A  closer  look  at  the  methylation 
alteration  induced  by  chronic  social  stress  pointed  to  a  single  site  of 
hypomethylation  in  the  proximal  promoter  flanking  the  first  exon  and  known  to 
bind  the  cyclic  adenosine  monophosphate  (cAMP)  response  element-binding  pro- 
tein (Aguilera  et  al.  2007).  The  importance  of  this  site  was  further  confirmed  by 
luciferase  assays  showing  that  mutating  a  single  base  in  the  CRE-binding  site 
substantially  reduced  the  cAMP-induced  CRF  promoter  activity  (Elliott 
et  al.  2010).  These  changes  in  methylation  and  expression  were  also  accompanied 
by  a  significant  decrease  in  the  DNA  methyltransferase  3b  expression  and  by  an 
increase  in  the  expression  of  the  demethylating  candidate  gadd45b.  Interestingly, 
chronic  treatment  with  the  tricyclic  antidepressant  imipramine  attenuated  the 
changes  in  DNA  methylation  and  expression  levels  induced  by  social  stress  (Elliott 
et  al.  2010).  Consequently,  these  findings  in  both  the  AVP  and  the  CRF  genes 
strongly  support  the  involvement  of  active  demethylation  in  the  long-term  effects  of 
early-life  adversity. 
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8.6    The  Brain-Derived  Neurotrophic  Factor  Gene 

Neurotrophic  factors  are  important  candidate  molecules  to  understand  the  develop- 
ment of  psychopathology  because  of  their  role  in  neuronal  survival  and  plasticity, 
as  well  as  their  expression  in  brain  regions  from  the  limbic  system,  where  emotions 
and  related  behaviors  are  processed.  For  instance,  it  is  hypothesized  that  their 
alteration  could  partly  underlie  changes  in  plasticity  observed  in  the  brains  of 
suicides  as  well  as  the  mood  symptoms  observed  in  depressive  patients.  While 
the  major  neurotrophic  factors  include  nerve  growth  factor  (NGF),  neurotrophin 
3  and  4  (NT3/4),  fibroblast  growth  factor  (FGF),  transforming  growth  factor  (TGF), 
and  brain-derived  neurotrophic  factor  (BDNF),  the  latter  has  received  most  of  the 
attention  in  neurobiological  research  of  psychiatric  conditions  such  as  depressive 
disorders  and  suicide.  For  instance,  low  serum  and  brain  BDNF  expression  has  been 
reported  in  patients  with  major  depression  (Brunoni  et  al.  2008;  Dwivedi 
et  al.  2003;  Pandey  et  al.  2008)  and  these  alterations  were  reversed  by  antidepres- 
sant treatment  (Chen  et  al.  2001;  Sen  et  al.  2008;  Matrisciano  et  al.  2009).  In  mice, 
BDNF  depletion  induces  depressive -like  behaviors  (Chan  et  al.  2006)  while  in  rats, 
chronic  stress  and  persistent  pain  reduce  BDNF  expression  in  the  hippocampus 
(Gronli  et  al.  2006;  Duric  and  McCarson  2005),  and  these  effects  are  counteracted 
by  antidepressant  treatment  (Duric  and  McCarson  2006;  Rogoz  et  al.  2005;  Xu 
et  al.  2006). 

BDNF  epigenetic  regulation  has  recently  been  investigated  in  mice  and  rat 
models  of  stress-induced  depressive  symptoms  (Tsankova  et  al.  2006;  Roth 
et  al.  2009),  as  well  as  in  a  rat  model  of  exposure  to  traumatic  events  (Roth 
et  al.  2011).  In  both  species,  the  BDNF  gene  contains  nine  5f  noncoding  first 
exons  with  their  own  promoter,  but  coding  for  the  same  protein  (Aid  et  al.  2007). 
The  alternative  splicing  of  these  exons  specifies  the  tissue  in  which  BDNF  is 
expressed  (Aid  et  al.  2007).  In  both  species,  epigenetic  processes  involved  in  the 
transcriptional  control  of  BDNF  have  been  shown  to  be  altered  by  stress.  For 
instance,  chronic  social  stress  in  mice  decreases  the  expression  of  two  specific 
BDNF  transcripts  (III  and  IV)  in  the  hippocampus  (Tsankova  et  al.  2006),  while 
maternal  maltreatment  decreases  prefrontal  cortex  (PFC)  BDNF  mRNA  expression 
in  rats  (Roth  et  al.  2009).  Although  similar,  these  transcriptional  alterations  were 
shown  to  be  induced  by  different  epigenetic  mechanisms.  Indeed,  chronic  stress  in 
mice  raises  H3K27  dimethylation  levels  in  transcripts  III  and  IV  promoters 
(Tsankova  et  al.  2006),  while  site-specific  hypermethylation  is  found  in  transcripts 
IV  and  IX  promoters  of  maltreated  rats  (Roth  et  al.  2009).  In  the  latter  study,  site- 
specific  hypermethylation  seems  to  follow  a  developmental  pattern,  with  exon  IX 
promoter  hypermethylation  occurring  immediately  after  the  maltreatment  regimen, 
while  promoter  IV  methylation  increases  gradually  to  reach  significantly  altered 
levels  only  at  adulthood.  Surprisingly,  in  one  of  these  studies  (Tsankova 
et  al.  2006),  no  DNA  methylation  difference  was  found  in  association  with  histone 
modifications,  while  no  histone  modification  was  reported  in  association  with  DNA 
methylation  alterations  in  the  other  study  (Roth  et  al.  2009).  These  findings 
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illustrate  that  early-life  or  chronic  stressors  may  alter  different  epigenetic 
mechanisms  with  common  transcriptional  consequences:  the  latter  leading  to  the 
compaction  of  chromatin  in  its  heterochromatic  state,  and  the  former,  blocking  the 
binding  of  transcription  factors  to  DNA.  On  the  other  hand,  these  results  may  also 
highlight  the  heterogeneity  of  stress-induced  epigenetic  alterations  between 
species. 

Recently,  the  epigenetic  regulation  of  BDNF  has  been  shown  to  be  altered  in  a 
rat  model  of  PTSD  symptoms  (Roth  et  al.  2011).  Given  the  findings  discussed 
above,  the  authors  focused  on  exon  IV.  Stressed  rats  showed  increased  DNA 
methylation  in  the  dorsal  dentate  gyrus  and  in  the  CA3  regions.  Contrary  to 
expectation,  a  global  hypomethylation  was  observed  in  the  ventral  CA1  region. 
These  epigenetic  alterations  were  accompanied  by  significant  downregulation  of 
BDNF  exon  IV  expression  in  both  the  dorsal  and  ventral  CA1  regions  in  the  stressed 
rats  relative  to  non-stressed  rats.  Interestingly,  these  alterations  were  restricted  to 
the  hippocampus  since  no  alterations  were  found  in  the  basolateral  amygdala  nor  in 
the  medial  prefrontal  cortex.  These  findings  suggest  that  DNA  methylation  may  be 
affected  differently  within  the  same  structure  depending  on  the  function  and  the 
connections  these  regions  have.  Given  that  the  expression  of  exon  IV  was  decreased 
in  the  ventral  CA1  without  any  significant  changes  in  DNA  methylation,  these 
findings  also  suggest  that,  although  DNA  methylation  may  have  an  important  role 
in  the  regulation  of  BDNF,  other  mechanisms  are  probably  involved. 

Pharmacological  treatment  with  the  tricyclic  antidepressant  imipramine  was 
able  to  reverse  the  effect  of  chronic  stress  on  BDNF  transcription  in  mice 
(Tsankova  et  al.  2006).  However,  this  reversal  does  not  seem  to  be  due  to  the 
reinstatement  of  altered  histone  modifications  but  rather  due  to  alteration  of  an 
indirect  pathway.  Indeed,  chronic  but  not  acute  imipramine  treatment  did  not 
reinstate  H3K27  basal  dimethylation  levels,  but  rather  decreased  HDAC5  levels 
in  the  hippocampus  of  chronically  stressed  mice  leading  to  a  global  hyperace- 
tylation  in  transcripts  III  and  IV  promoter  regions.  The  importance  of  histone 
acetylation  in  the  effect  of  antidepressant  treatment  has  indeed  been  previously 
reported  in  animal  models  of  stress-induced  depression  (Sun  et  al.  2013).  Addition- 
ally, this  hyperacetylation  was  associated  with  higher  hippocampal  levels  of  H3K4 
dimethylation  in  the  area  of  BDNF  III  and  IV  promoters  with  both  modifications 
related  to  transcriptional  activation.  Consequently,  these  results  suggest  the  exis- 
tence of  a  compensatory  mechanism  in  the  reinstatement  of  basal  BDNF  levels  by 
chronic  imipramine  treatment  following  chronic  stress,  and  they  emphasize  the 
importance  of  chromatin  hyperacetylation  induced  by  antidepressant  treatment. 

Recently,  the  methylation  state  of  BDNF  was  also  assessed  in  postmortem  brains 
from  suicide  completers  (Keller  et  al.  2010).  The  human  BDNF  gene  is  also 
composed  of  1 1  exons  preceded  by  nine  noncoding  first  exons  regulating  BDNF 
expression  in  different  tissue  (Pruunsild  et  al.  2007).  In  Keller  et  al.  (2010)  study, 
three  different  methods  were  used  to  quantify  methylation  levels  in  a  region 
encompassing  part  of  noncoding  exon  IV  and  its  promoter  in  the  Wernicke's 
area.  Their  results  show  that  methylation  in  four  CpGs  located  downstream  of  the 
promoter  IV  transcription  initiation  site  were  significantly  increased  in  suicide 
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completers  compared  to  controls.  These  differences  were  specific  to  the  BDNF 
promoter  since  the  investigation  of  genome- wide  methylation  in  these  subjects  did 
not  reveal  any  significant  difference  between  groups.  In  addition,  BDNF  expression 
in  subjects  with  high  methylation  levels  was  significantly  lower  than  in  subjects 
with  low  and  medium  methylation  levels,  supporting  the  repressive  effects  on 
transcription  of  methylation  within  the  promoter. 

Finally,  following  the  studies  in  mice  reported  above  (Tsankova  et  al.  2006),  our 
group  provided  evidence  suggesting  that  antidepressants  promote  open  chromatin 
structure  (i.e.,  lower  H3K27me3  level)  in  the  promoter  of  BDNF  in  the  PFC  (Chen 
et  al.  201 1).  Follow-up  studies  in  depressed  patients  also  revealed  higher  BDNF  IV 
noncoding  exon  expression  in  the  blood  of  citalopram  treatment  responders  com- 
pared to  nonresponders  (Lopez  et  al.  2012).  Interestingly,  H3K27me3  levels  were 
inversely  correlated  with  both  BDNF  IV  expression  levels  and  with  the  severity  of 
symptoms. 


8.7    The  Ribosomal  RNA  Gene 

Ribosomal  RNA  (rRNA)  decodes  the  mRNA  into  amino  acids.  Hence,  rRNA  is  a 
bottleneck  structure  for  protein  synthesis,  allowing  adequate  cell  function 
depending  on  the  cell  needs.  The  rRNA  promoter  is  composed  of  two  regulatory 
regions,  namely,  the  upstream  control  element  (UCE)  and  the  core  promoter  that 
binds  the  upstream  binding  factor  (UBF)  (Haltiner  et  al.  1986;  Learned  et  al.  1986; 
Ghoshal  et  al.  2004).  The  expression  of  rRNA  genes  has  been  shown  to  be 
epigenetically  regulated  both  in  mice  (Santoro  and  Grummt  2001)  and  humans 
(Brown  and  Szyf  2007;  Ghoshal  et  al.  2004).  In  mice,  the  recruitment  of  transcrip- 
tion repressors  has  been  suggested  to  induce  chromatin  modifications  leading  to 
methylation  of  a  single  CpG  found  within  UBF-binding  sites  in  the  UCE.  This  is 
thought  to  prevent  UBF  binding  to  its  cognate  sequence  and  to  decrease  rRNA 
expression  (Santoro  and  Grummt  2001).  In  humans,  despite  the  fact  that  the  CpG 
density  in  both  promoter  regions  differs  from  mice  (Santoro  and  Grummt  2001; 
Ghoshal  et  al.  2004),  rRNA  expression  has  nevertheless  been  shown  to  be  epige- 
netically regulated  (Brown  and  Szyf  2007).  Indeed,  the  active  portion  of  the  rRNA 
promoter  associated  with  pol  I  has  been  shown  to  be  completely  unmethylated 
while  the  inactive  portion  is  almost  fully  methylated  (Brown  and  Szyf  2007). 

The  epigenetic  control  of  rRNA  gene  expression  has  been  shown  to  be 
dysregulated  in  the  hippocampus  of  abused  suicide  completers  (McGowan 
et  al.  2008).  Abused  suicide  completers  exhibited  smaller  rRNA  expression  levels 
associated  with  increased  methylation  in  21  out  of  26  CpGs  found  within  the  rRNA 
core  promoter  and  UCE  compared  to  controls.  From  a  mechanistic  point  of  view, 
these  results  suggest  that  methylation  represses  the  interaction  of  the  UBF  with  the 
core  promoter  sequence  and  consequently  decreases  both  the  recruitment  of  tran- 
scriptional cofactors  and  the  transcriptional  activity  of  the  RNA  pol.  Interestingly, 
these  alterations  seem  to  be  specific  to  the  HPC  since  no  group  difference  in  the 
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rRNA  methylation  pattern  was  found  in  the  cerebellum.  In  addition,  these  results 
did  not  reflect  global  methylation  differences,  as  genome- wide  methylation  levels 
did  not  reveal  any  methylation  difference  between  abused  suicides  and  controls. 


8.8    The  Tropomyosin-Related  Kinase  B  Receptor  Gene 

The  transmembrane  gene  tropomyosin-related  kinase  B  (TrkB)  is  the  receptor 
for  BDNF  and  has  long  been  investigated  in  the  neurobiology  of  mood  and 
related  disorders  (Duman  and  Monteggia  2006;  Kim  et  al.  2007;  Dwivedi 
et  al.  2003,  2009).  Expression  microarray  studies  have  reported  lower  TrkB  expres- 
sion in  the  prefrontal  cortex  of  depressed  subjects  (Aston  et  al.  2005;  Nakatani 
et  al.  2006)  and  antidepressant  treatment  has  been  shown  to  increase  its  expression 
in  cultured  astrocytes  (Mercier  et  al.  2004). 

The  TrkB  gene  is  found  on  chromosome  9  at  locus  q22.1  and  has  five  splice 
variants.  Splice  variant  Tl  or  TrkB-Tl  is  an  astrocytic  truncated  form  of  TrkB 
lacking  catalytic  activity  (Rose  et  al.  2003).  Recently,  analysis  of  the  methylation 
pattern  in  the  promoter  of  a  subset  of  suicide  completers  with  low  levels  of  TrkB-Tl 
expression  revealed  two  sites  where  methylation  levels  were  higher  in  suicide 
completers  compared  to  controls  (Ernst  et  al.  2009b).  The  methylation  pattern  at 
those  two  sites  was  negatively  correlated  with  the  expression  of  TrkB-Tl  in  suicide 
completers,  and  this  effect  was  specific  to  the  prefrontal  cortex,  since  no  significant 
difference  was  found  in  the  cerebellum.  Such  a  pattern  of  expression  and  methyla- 
tion is  thought  to  increase  predisposition  to  suicidal  behaviors.  In  addition,  suicide 
completers  with  low  TrkB-Tl  expression  showed  enrichment  of  H3K27  methyla- 
tion in  the  TrkB  promoter  (Ernst  et  al.  2009a),  suggesting  that  the  astrocytic  variant 
of  TrkB  may  be  under  the  control  of  epigenetic  mechanisms  involving  histone 
modifications  and  DNA  methylation.  Interestingly,  recent  data  showed  that  mice 
overexpressing  the  TrkB.Tl  variant  are  more  susceptible  to  chronic  social  stress 
than  wild-type  mice  since  the  first  group  exhibits  consistent  social  avoidance 
(Razzoli  et  al.  2011).  Together,  these  data  suggest  that  epigenetic  changes  in  the 
TrkB.Tl  promoter,  inducing  expression  changes,  could  define  the  vulnerability  to 
chronic  social  stress  and  possibly  to  early-life  adverse  experience. 


8.9    The  GABAergic  System 

The  GABAergic  system  has  been  the  focus  of  many  research  studies  in  postmortem 
brain  samples  of  psychiatric  patients,  and  particularly  individuals  with  histories  of 
depression  (Klempan  et  al.  2009;  Merali  et  al.  2004;  Torrey  et  al.  2005),  schizo- 
phrenia, or  bipolar  disorder,  many  of  whom  died  by  suicide  (Guidotti  et  al.  2000; 
Heckers  et  al.  2002;  Volk  et  al.  2000).  For  instance,  reductions  of  reelin  and 
glutamate  decarboxylase  1  (GAD1)  mRNA  (Guidotti  et  al.  2000)  and  an  increase 
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in  DNMT1  expression  (Veldic  et  al.  2004;  Kundakovic  et  al.  2007)  were  previously 
reported  in  postmortem  brains  of  schizophrenia  and  bipolar  subjects  who  died  by 
suicide.  Promoter  hypermethylation  has  been  reported  for  both  genes,  consistent 
with  the  methylating  role  of  DNMT1  (Grayson  et  al.  2005;  Tamura  et  al.  2007). 

More  recently,  the  hippocampal  expression  of  GAD1  has  been  shown  to  be 
affected  by  maternal  care  in  rats  (Zhang  et  al.  2010).  Indeed,  pups  raised  by  mothers 
providing  low  LG  have  lower  GAD1  hippocampal  expression  associated  with 
promoter  hypermethylation  and  lower  levels  of  H3K9ac,  compared  to  pups  raised 
by  high  LG  mothers.  Interestingly,  maternal  LG  is  also  associated  with  DNMT1 
hippocampal  levels.  Functional  assays  revealed  that  the  transcription  factor  NGFIA 
binds  the  GAD1  promoter  in  order  to  increase  GAD1  expression.  Consequently, 
these  results  suggest  that,  similar  to  the  regulation  of  GR  in  rat  hippocampus,  GAD1 
expression  is  modulated  by  maternal  behavior  via  epigenetic  mechanisms  involving 
DNA  methylation  interfering  with  the  binding  of  activating  transcription  factors 
and  by  chromatin  modifications  (Zhang  et  al.  2010). 

These  findings  are  in  accordance  with  the  study  of  Poulter  et  al.  (2008)  that 
examined  the  expression  of  DNA  methyltransferases  as  well  as  the  GABAA 
receptor  al  subunit  in  the  brain  of  suicide  completers.  Three  hypermethylated 
CpG  sites  within  the  al  subunit  promoter  were  identified  in  the  PFC  of  suicide 
completers  and  negatively  correlated  with  DNMT3b  protein  expression.  Besides 
DNMT3b,  DNMT1  and  DNMT3a  levels  have  also  been  reported  to  be  altered  in  the 
limbic  system  and  brain  stem  of  suicide  completers.  However,  in  this  study  there 
were  no  reports  of  early-life  adversity,  and  thus,  one  cannot  assume  that  these 
effects  would  be  similar  in  abused  suicide  completers. 


8.10    Other  Epigenetic  Alterations  in  Suicide  Brains 

In  the  light  of  the  results  discussed  above,  early-life  adversity  seems  to  modify 
epigenetic  control  of  gene  expression.  These  changes  can  take  place  through 
histone  modifications  and/or  DNA  methylation.  Moreover,  epigenetic  changes 
correlate  with  behavioral  modifications  in  animals  and  humans,  thus  strongly 
suggesting  that  epigenetics  may  act  as  an  interface  mediating  the  effect  of  environ- 
ment on  the  genome. 

Additional  studies  have  focused  on  other  functional  systems  which  have  been 
implicated  in  depression  and  suicide.  Among  these  systems,  the  polyamine  and  the 
serotonergic  systems  are  noteworthy. 

Polyamines  are  ubiquitous  aliphatic  molecules  involved  in  cellular  functions 
including  growth,  division,  and  signaling  cascades  (Gilad  and  Gilad  2003;  Minguet 
et  al.  2008).  The  polyamines  also  play  a  major  role  in  the  regulation  of  stress  (Rhee 
et  al.  2007;  Fiori  and  Turecki  2008),  since  they  are  dependent  on  the  activation  of 
the  HPA  axis  and  the  subsequent  increased  concentrations  of  circulating 
glucocorticoids  (Gilad  and  Gilad  2003).  Furthermore,  the  emergence  of  the  charac- 
teristic adult  polyamine  stress  response  correlates  with  the  cessation  of  the 
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hyporesponsive  period  of  the  HPA  axis  system  (Gilad  et  al.  1998).  Previously, 
spermine  synthase  (SMS),  spermidine/spermine  Nl-acety transferase  (SAT1),  and 
ornithine  aminotransferase-like  1  (OATL1)  expression  have  been  shown  to  be 
altered  in  the  limbic  system  of  suicide  completers  with  a  history  of  depressive 
disorders  (Sequeira  et  al.  2006,  2007).  However,  follow-up  studies  revealed  that 
epigenetic  alterations  in  the  promoter  region  of  genes  involved  in  the  polyamine 
synthesis  do  not  account  for  these  changes  in  expression.  More  recently,  site- 
specific  differential  methylation  has  been  found  in  the  promoter  of  ARG2  and 
AMD1  (Gross  et  al.  2012)  that  was  inversely  correlated  with  expression  in  BA  44. 

The  serotonergic  system  is  a  neurotransmitter  system  of  great  importance  in 
psychiatry  and  has  been  extensively  investigated  in  depression  and  suicide.  Lower 
concentrations,  binding,  neurotransmission,  and  reuptake  of  serotonin  and  its 
metabolites  are  risk  markers  for  suicidality  and  major  depression  (Cronholm 
et  al.  1977;  Bhagwagar  and  Co  wen  2008).  Among  the  various  serotonergic 
receptors,  particular  attention  has  been  given  to  5-HT2a  and  its  gene,  as  an 
important  candidate  in  association  studies  of  suicidal  behavior  (Du  et  al.  2001; 
Turecki  et  al.  1999).  One  of  the  variants  most  commonly  investigated  was  the 
102  C/T  polymorphism,  located  in  exon  1  (Du  et  al.  2000;  de  Luca  et  al.  2007). 
Methylation  in  the  C-allele  variant  in  this  polymorphism  has  previously  been 
associated  with  higher  DNMT1  expression  in  the  brain  and  leukocytes  of  healthy 
subjects  (Polesskaya  et  al.  2006).  Although  methylation  was  reported  as  increased 
in  leukocytes  from  suicide  ideators,  a  nonsignificant  hypomethylation  was  reported 
in  the  PFC  of  suicide  completers  carrying  the  C-allele  (de  Luca  et  al.  2009), 
suggesting  that  methylation  levels  may  be  different  in  individuals  who  committed 
suicide  and  those  who  are  planning  suicide.  On  the  other  hand,  the  functional 
significance  of  this  hypermethylation  in  leukocytes  remains  to  be  explored,  and 
since  significance  levels  were  not  reached  in  brain  tissue,  further  research  is 
required. 


8.11    Genome-Wide  DNA  Methylation  Alteration 
by  Early-Life  Stress 

Overall,  environmental  factors  seem  to  target  the  epigenetic  regulation  of  genes 
involved  in  key  regulatory  processes  such  as  the  HPA  axis,  neurotrophic  factors, 
neurotransmission,  polyamines,  and  protein  synthesis.  However,  while  a  growing 
body  of  evidence  supports  the  contribution  of  epigenetic  factors  translating  the 
effects  of  EL  A  on  the  human  genome,  there  is  a  real  need  for  large-scale  compre- 
hensive studies  assessing  genome-wide  epigenetic  patterns  in  the  context  of  differ- 
ent environmental  factors.  A  few  of  these  studies  recently  reported  interesting 
findings  suggesting  that  child  abuse,  while  targeting  critical  genes,  may  also  induce 
genome- wide  reprogramming  of  DNA  methylation  patterns. 
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Our  group  recently  assessed  the  impact  of  child  abuse  on  genome-wide  DNA 
methylation  signatures  in  gene  promoters  (Labonte  et  al.  2012a).  In  this  study,  we 
compared  hippocampal  DNA  methylation  patterns  between  suicide  completers 
with  a  severe  history  of  child  abuse  (sexual  and/or  physical)  and  healthy  controls. 
We  identified  hundreds  of  sites  that  were  differentially  methylated,  both  with 
increased  and  decreased  methylation,  in  the  hippocampus  of  severely  abused 
suicide  completers.  It  is  interesting  to  note  that  DNA  methylation  levels  in  gene 
promoters  were  inversely  correlated  with  gene  expression  at  the  genome-wide 
level,  and  differential  methylation  in  abused  suicides  were  enriched  in  genes 
involved  in  neuroplasticity,  a  finding  consistent  with  the  notion  that  abusive 
experiences  during  childhood  lead  to  plastic  changes  in  the  brain  as  a  response  to 
these  negative  environmental  stimuli.  Similar  observations  have  been  made  in 
suicide  completers  (Labonte  et  al.  2013)  who  present  methylation  changes  that 
are  enriched  in  genes  related  to  learning  and  memory,  and  in  peripheral  samples 
from  PTSD  patients  (Uddin  et  al.  2010).  However,  Uddin  and  colleagues  did  not 
restrict  their  analysis  to  promoters  but  rather  measured  DNA  methylation  levels  at 
14,000  CpGs  across  the  genome.  The  analysis  revealed  an  overrepresentation  of 
differentially  methylated  CpGs  in  genes  related  to  immune  function.  This  may  be 
translated  into  the  development  of  different  psychopathological  processes.  Further- 
more, a  previous  study  performed  in  the  PFC  of  psychotic  and  bipolar  patients 
reported  differential  methylation  in  numerous  sites  that  were  involved  in 
glutamatergic  and  GABAergic  neurotransmission,  brain  development,  and 
response  to  stress  (Mill  et  al.  2008).  More  importantly,  these  studies  were 
conducted  in  different  tissues  (blood  versus  brain)  and  different  brain  regions 
(HPC  versus  PFC),  which  may  account  for  the  discrepancies  between  studies,  as 
different  tissues  (Ladd-Acosta  et  al.  2007)  and  cell  types  (Deaton  et  al.  2011; 
Iwamoto  et  al.  2011)  have  been  shown  to  exhibit  specific  DNA  methylation 
signatures. 


8.12  Conclusion 


Examining  together  this  extensive  body  of  research,  there  is  significant  evidence 
suggesting  that  early-life  adversity  affects  molecular  mechanisms  involved  in  the 
regulation  of  behavior.  These  effects  involve  alterations  in  DNA  methylation  and 
histone  modifications,  which  are  believed  to  induce  behavioral  aberrations  during 
development  or  later  in  life  by  affecting  genes  involved  in  crucial  neuronal  pro- 
cesses. Studies  performed  in  postmortem  brains  from  suicide  completers  with  a 
history  of  childhood  abuse  have  highlighted  several  environmentally  induced 
epigenetic  alterations  in  the  regulatory  regions  of  genes  involved  in  the  response 
to  stress  (see  Table  8.1  for  a  summary).  Similarly,  investigating  the  effect  on 
animals  of  variation  in  early-life  environment  has  revealed  useful  information  for 
expanding  our  understanding  of  the  molecular  mechanisms  involved  in  the  effect  of 
environmental  stressors  on  the  regulation  of  behavior  (Table  8.2).  Together,  these 
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findings  suggest  that  epigenetics  may  act  as  a  mechanism  whereby  environmental 
factors  act  on  the  modulation  of  long-term  behavioral  responses.  In  individuals  with 
particular  predispositions  toward  psychiatric  disorders,  these  alterations  may  help 
trigger  the  expression  of  the  illness.  From  a  therapeutic  point  of  view,  it  is  tempting 
to  speculate  on  the  clinical  potential  these  findings  may  provide.  In  the  future,  they 
could  potentially  lead  to  the  development  of  tools  for  the  identification  of 
individuals  at  risk,  and  therefore,  the  possibility  of  preventive  intervention.  How- 
ever, there  are  major  challenges  in  their  potential  implementation,  not  the  least  of 
which  is  access  to  target  tissue  in  living  subjects,  modification  of  epigenetic 
profiles,  and  appropriate  delivery  of  such  interventions.  Given  that  this  is  a  rela- 
tively new  area  of  research,  the  current  knowledge  is  significantly  limited. 
Integrating  genome-wide  approaches  will  provide  a  more  comprehensive  view  on 
the  complexity  of  the  relationship  between  early-life  adversity  and  the  psychopa- 
thology  of  brain  disorders.  Furthermore,  since  these  studies  can  provide  informa- 
tion on  the  molecular  nature  of  stress-induced  psychopathologies,  future  work 
should  assess  whether  similar  alterations  can  be  found  in  more  accessible  tissue. 
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Chapter  9 

Interaction  Between  Genetics 
and  Epigenetics  in  Cancer 

Amanda  Ewart  Toland 


Abstract  Cancer  is  a  disease  caused  by  somatic  mutations  in  key  genes  important 
in  tumorigenesis.  Historically,  the  focus  of  study  has  been  on  the  role  of  mutations, 
including  chromosomal  rearrangements,  in  the  pathogenesis  of  cancer.  About 
30  years  ago,  a  link  was  made  between  methylation  at  promoter  regions  of 
tumor- suppressor  genes  and  cancer  development.  Now  we  know  that  abnormal 
epigenetic  regulation  of  genes  through  changes  in  methylation,  chromatin 
remodeling,  and  expression  of  noncoding  regulatory  genes  such  as  microRNAs 
all  have  been  linked  to  cancer  development,  progression,  and  metastasis.  Despite 
the  classical  definition  of  epigenetics  as  being  independent  of  DNA  sequence,  DNA 
sequence  variations,  both  somatic  and  germline,  have  been  shown  to  influence 
specific  epigenetic  events.  This  chapter  focuses  on  the  relationship  between  DNA 
sequence  and  alterations  in  epigenetic  patterning  with  a  particular  emphasis  on 
human  cancers. 


9.1  Introduction 


9.1.1    Cancer  Genetics  and  Genomics 

Cancer  is  a  heterogeneous  disease  characterized  by  cells  that  exhibit  abnormal 
growth  and  can  spread  and  colonize  other  sites  in  the  body.  Cancer  is  described  as  a 
disease  caused  by  mutations  or  changes  to  the  DNA  that  result  in  features  such  as 
sustained  proliferation,  resistance  to  cell  death,  induction  of  apoptosis,  ability  to 
replicate  indefinitely,  evasion  of  suppressors  of  growth,  and  ability  to  invade  and 
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metastasize  (Hanahan  and  Weinberg  2011).  Cancer-inducing  mutations  can  be 
inherited  from  one's  parents  through  the  germline  or  can  occur  somatically  during 
one's  lifetime.  Germline  mutations  exist  in  every  cell  of  the  body,  whereas  somatic 
mutations  only  occur  in  the  original  cell  sustaining  the  mutation  and  all  of  its 
daughter  cells.  Tumors  can  contain  tens  to  hundreds  of  different  somatic  mutations 
important  in  tumor  development  and  progression.  Germline  mutations  are  not 
typically  sufficient  to  drive  tumorigenesis  on  their  own,  and  hence  additional 
somatic  mutations  are  necessary. 

There  is  a  continuum  of  risk  associated  with  DNA  variations.  Germline  changes 
to  the  DNA  that  are  typically  rare  and  impart  large  magnitude  of  risk  are  considered 
to  be  mutations,  also  known  as  highly  penetrant,  pathogenic,  or  deleterious 
mutations.  These  mutations  are  associated  with  familial  cancer  syndromes  such 
as  Lynch  syndrome  (mutations  in  MLH1,  MSH2,  MSH6),  hereditary  breast  and 
ovarian  cancer  (mutations  in  BRCA1,  BRCA2),  and  von  Hippel-Lindau  syndrome 
(mutations  in  VHL).  Germline  changes  to  the  DNA  that  are  more  common  and  exert 
low  or  moderate  increases  in  risk  are  considered  to  be  low-penetrance  variants  or 
moderate-risk  alleles.  Variations  that  occur  at  frequencies  of  above  1  %  are 
classified  as  polymorphisms,  although  some  low-penetrance  alleles  that  impact 
cancer  risk  may  be  rarer  than  this.  Individually,  low-penetrance  alleles  do  not 
increase  an  individual's  risk  sufficiently  to  be  clinically  useful.  However,  any  one 
individual  may  carry  hundreds  of  risk  alleles  which  collectively,  and  in  the  context 
of  specific  environmental  risk  factors,  can  impart  significant  disease  risk  (Tenesa 
and  Dunlop  2009).  Low-penetrance  variants  may  also  modify  the  risk  in  individuals 
who  carry  a  deleterious  mutation  and  as  such  are  called  modifier  alleles  (Antoniou 
and  Chenevix-Trench  2010). 


9.1.2  Epigenetics 

In  addition  to  DNA  sequence  alterations,  changes  in  gene  regulation  through 
epigenetic  modifications  can  also  initiate  and  promote  tumorigenesis.  In  contrast 
to  mutations  which  are  defined  by  the  DNA  sequence  itself,  epimutations  are 
changes  which  do  not  alter  the  DNA  code,  but  modify  the  way  in  which  genes 
are  regulated  or  expressed.  The  most  commonly  described  epigenetic  events  are 
CpG  methylation  of  DNA  base  pairs  and  histone  modifications  that  lead  to  chroma- 
tin remodeling.  Additional  epigenetic  regulators  of  gene  expression  include 
microRNAs  (miRNAs)  and  other  small  noncoding  RNAs,  paramutations,  nuclear 
organization,  and  chromatin  looping  (Toland  2012).  The  epigenome,  or  all  the 
epigenetic  marks  in  a  cell  or  an  individual,  is  more  fluid  than  DNA  code. 
Epigenomes  normally  evolve  over  an  individual's  lifetime  and  are  influenced  by 
developmental  stage,  tissue  type,  environmental  factors,  and  genetics  (Gordon 
et  al.  2012;  Heyn  et  al.  2012;  Meagher  and  Musser  2012).  Like  somatic  mutations, 
epigenetic  modifications  can  be  passed  from  mother  to  daughter  cells  and  can 
be  environmentally  induced.  Unlike  germline  mutations  that  can  be  passed 
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from  generation  to  generation  and  persist  in  the  genome,  most  epigenetic 
alterations  occurring  during  one's  lifetime  are  reset  during  gametogenesis  and 
early  embryogenesis  and  are  not  transmitted  to  one's  offspring  (Morgan 
et  al.  2005;  Fleming  et  al.  2008;  Migicovsky  and  Kovalchuk  2011).  There  is  recent 
evidence,  particularly  in  animal  and  plant  models,  that  some  epigenetic  marks  can 
persist  or  be  reset  and  that  soluble  factors  influencing  gene  expression  and  pheno- 
type  may  be  responsible  (Migicovsky  and  Kovalchuk  2011;  Cuzin  and 
Rassoultzadegan  2010;  Wagner  et  al.  2008;  Grandjean  et  al.  2009). 


9.1.2.1  Methylation 


DNA  nucleotides  can  be  modified  to  induce  alterations  of  gene  regulation.  The  best 
characterized  is  DNA  methylation  which  is  the  addition  of  a  methyl  group  to  a 
cytosine  resulting  in  a  5-methylcytosine.  In  many  cases,  the  cytosines  are  located  5f 
to  a  guanine  (CpG).  CpG  islands  are  cytosine-  and  guanine-rich  regions  found  in 
promoter  regions  of  an  estimated  70  %  of  genes  and  are  also  located  near  retrotran- 
sposable  elements  and  repetitive  DNA  (Widschwendter  and  Jones  2002).  The 
methylation  process  is  characterized  by  the  transfer  of  a  methyl  group  from 
S-adenosylmethionine  (SAM)  by  DNA  methyltransf erases  (DNMTs)  to  the  C5 
position  of  a  cytosine.  Methylation  of  gene  promoters  is  associated  with  silencing 
of  transcription  and  hypomethylation  with  gene  expression.  Retrotransposons  and 
repetitive  DNA-containing  regions  are  typically  hypermethylated  (Jintaridth  and 
Mutirangura  2010).  There  is  a  strong  correlation  between  DNA  methylation  of 
promoter  regions  with  repressive  histone  modifications  such  as  H3K27me3  that 
result  in  inactive  chromatin  states.  Methylation  of  intronic  regions  has  recently 
been  recognized  in  actively  transcribed  genes  and  been  postulated  to  have  a  role  in 
pre-mRNA  splicing  (Shukla  et  al.  2011;  Sati  et  al.  2012).  A  second  type  of  DNA 
methylation,  5-hydroxymethylcytosine,  has  also  been  identified  in  gene  promoters 
and  intragenic  regions,  but  the  biological  role  of  this  modification  is  not  fully 
understood  (Jin  et  al.  2011;  Ku  et  al.  2011). 


9.1.2.2  Chromatin 


Chromatin  is  a  means  of  compacting  DNA.  It  is  also  critical  for  determining 
accessibility  of  DNA  to  transcriptional  machinery  based  on  the  chromatin  state. 
Chromatin  is  characterized  by  147  bp  of  DNA  wrapped  around  a  histone  protein 
octomer  consisting  of  four  histone  proteins,  H2A,  H2B,  H3,  and  H4,  each  in 
duplicate  (Kornberg  and  Lorch  1999).  Transcriptionally  active  DNA  occurs  in 
open  or  non-condensed  chromatin  (euchromatin)  whereas  transcriptionally  silenced 
DNA  is  found  in  highly  compacted  chromatin  (heterochromatin).  Chromatin  states 
are  determined  by  a  large  number  of  histone  posttranslational  modifications  that 
include  acetylation,  methylation,  ubiquitination,  phosphorylation,  and  sumoylation 
(Fullgrabe  et  al.  2011). 
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9.1.2.3    Chromatin  Looping  and  Nuclear  Positioning 

Studies  have  indicated  that  the  position  of  DNA  in  the  nucleus,  particularly  in 
relationship  to  promoters  and  enhancers,  may  also  be  critical  for  gene  regulation 
(Meister  et  al.  2010;  Schoenf elder  et  al.  2010;  Brickner  et  al.  2012;  Burns  and 
Wente  2012).  This  type  of  epigenetic  regulation,  also  called  position-effect  varie- 
gation, was  first  described  in  Drosophila  but  has  been  found  in  several  other 
organisms  including  humans  (Guffei  et  al.  2010).  In  addition  to  position  in  the 
nucleus,  the  interaction  between  chromosomal  regions  has  been  found  using  new 
technologies  such  as  chromosome  conformation  capture  (3C)  which  cross-links 
DNA  for  downstream  sequencing.  Translocations,  a  hallmark  of  many  cancer 
genomes,  have  been  mapped  to  chromosomes  sharing  nuclear  space  suggesting 
the  influence  of  the  nuclear  organization  on  specific  translocations  that  occur  in 
cancer  (Zhang  et  al.  2012a).  Single-nucleotide  polymorphisms  (SNPs)  have  been 
identified  from  genome-wide  association  studies  that  map  to  enhancer  elements  and 
disrupt  normal  regulation,  possibly  through  interference  with  chromosome  looping 
(Akhtar-Zaidi  et  al.  2012).  Examples  include  variants  that  influence  SOX9  in 
prostate  cancer  and  c-MYC  in  colorectal  tumors  (Wright  et  al.  2010;  Zhang 
et  al.  2012b). 


9.1.2.4    Small  Noncoding  RNA 

Another  category  of  epigenetic  regulation  is  noncoding  RNAs.  miRNAs  are  small 
noncoding  RNAs  of  18-23  nucleotides  that  regulate  gene  expression  of  specific 
genes  by  targeting  mRNAs  for  degradation  or  inhibiting  their  translation.  Addi- 
tional noncoding  RNAs,  including  long  noncoding  RNAs  (IncRNAs),  small  nucle- 
olar RNAs  (sno-RNAs),  piwi-interacting  RNAs  (piRNAs),  promoter-associated 
small  RNAs  (pasRNAs),  transcription  initiation  RNAs  (tiRNAs),  and  endogenous 
small  interfering  RNAs  (siRNAs)  (Taft  et  al.  2010),  are  all  associated  with  gene 
regulation  via  different  mechanisms  (Toland  2012).  There  is  emerging  evidence 
that  some  small  noncoding  RNAs  may  be  transmitted  to  offspring  through  the 
sperm  (Hamatani  2012). 


9.2    Cancer  and  Epigenetics 

Since  1983,  when  Feinberg  and  Vogelstein  first  described  aberrant  DNA  methyla- 
tion  in  cancers  (Feinberg  and  Vogelstein  1983)  countless  studies  have  detailed  the 
abnormal  epigenetic  patterning  occurring  in  tumors.  DNA  hypomethylation  of  the 
genome  and  hypermethylation  of  critical  tumor- suppressor  genes  are  early  events  in 
several  tumor  types  (Ehrlich  2006,  2009).  Genomic  instability,  another  feature  of 
tumors,  has  been  hypothesized  to  be  caused  in  part  by  hypomethylation,  possibly 
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through  activation  of  viral  genes  or  retrotransposable  elements  (Rodriguez 
et  al.  2006;  Daskalos  et  al.  2009).  Whereas  many  of  the  epigenetic  alterations  that 
occur  during  tumorigenesis  are  thought  to  arise  through  environmental  modifiers 
and  somatic  mutations  in  critical  epigenetic  regulators,  some  alterations  are  due  to 
germline  variations  or  mutations.  There  are  several  means  in  which  alterations  to 
the  DNA  sequence  can  influence  epigenetic  patterning  and  status  of  a  cell  including 
somatic  mutations,  germline  mutations,  germline  variants,  somatic  copy  number 
losses  and  amplifications,  and  copy  number  variants. 


9.3    Changes  in  Germline  DNA  Leading  to  Aberrant 
Epigenetic  Regulation  in  Tumorigenesis 

9.3.1    Germline  Mutations  Leading  to  Cancer  Syndromes 

Germline  mutations  in  genes  responsible  for  normal  epigenetic  functioning  have 
been  linked  to  increased  risk  of  developing  cancer.  Inherited  mutations  in  DICER,  a 
protein  important  in  the  processing  of  miRNA,  are  associated  with  familial 
pleuropulmonary  blastoma  (PPB)-predisposition  syndrome  (  Hill  et  al.  2009)  and 
are  also  found  in  multilocular  cystic  nephroma  (Bahubeshi  et  al.  2010).  Truncating 
somatic  mutations  in  TARBP2,  a  component  of  the  DICER  complex,  are  mutated  in 
colon  and  endometrial  cell  lines  and  primary  tumors;  these  mutations  result  in 
aberrant  miRNA  processing  (Melo  et  al.  2009). 

In  addition  to  mutations  directly  affecting  the  epigenetic  machinery,  inherited 
mutations  have  been  associated  with  other  cancer  syndromes.  The  best  studied 
hereditary  cancer  syndrome  with  links  to  inherited  epigenetic  alterations  is  Lynch 
syndrome,  a  hereditary  colorectal  cancer  syndrome  caused  by  germline  mutations 
in  the  mismatch  repair  genes  MSH2,  MLH1,  MSH6,  and  PMS2.  There  are  multiple 
reports  of  epigenetic  silencing  of  MLH1  which  persisted  through  multiple 
generations  and  led  to  Lynch  syndrome  cancers  (Hitchins  et  al.  2005;  Goel 
et  al.  2011;  Crepin  et  al.  2012).  The  original  descriptions  of  this  phenomenon 
postulated  that  the  methylation  was  not  tied  to  a  particular  MLH1  mutation  as  no 
sequence  changes  were  initially  identified  (Suter  et  al.  2004;  Hitchins  et  al.  2011). 
Subsequently,  one  mechanism  for  MLH1  epigenetic  silencing  in  multiple  Lynch 
syndrome  families  was  identified  as  a  single-nucleotide  change  in  the  5'UTR  of 
MLH1  (Hitchins  et  al.  2011). 

Another  example  of  germline  epigenetic  alterations  that  leads  to  a  high  risk  of 
cancer  is  DAPK1.  DAPK1,  a  mediator  of  apoptosis,  is  epigenetically  silenced  via 
promoter  methylation  in  familial  chronic  lymphocytic  leukemia  (Raval  et  al.  2007). 
A  promoter  mutation  was  found  to  interfere  with  HOX7  binding  leading  to 
subsequent  promoter  methylation.  Several  studies  have  assessed  hereditary  breast 
and  ovarian  cancer  families  for  constitutive  promoter  methylation  suggestive  of 
an  epimutation  similar  to  this.  Initial  studies  using  small  number  of  BRCA1I2 
mutation-negative  families  did  not  find  evidence  of  promoter  methylation 
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(Chen  et  al.  2006),  but  larger  studies  suggest  that  a  small  percentage  of  these 
individuals  may  have  inherited  silencing  of  these  genes  (Snell  et  al.  2008;  Wong 
et  al.  201 1;  Hansmann  et  al.  2012).  These  studies  suggest  that  other  high-risk  cancer 
syndromes  may  be  due  in  part  to  epigenetic  silencing  although  the  etiology  and  the 
frequency  of  germline  mutations  that  influence  epigenetic  patterning  in  hereditary 
cancer  remain  to  be  determined. 


9.3.2    Single -Nucleotide  Polymorphisms  and  Allele-Specific 
Epigenetic  Modifications 

Several  studies  have  identified  genes  showing  allele- specific  expression;  some  of 
the  allele- specific  expression  has  been  associated  with  allele-specific  methylation, 
methylation  that  occurs  preferentially  on  one  allele  compared  to  the  other.  Approx- 
imately 35,000  sites  across  the  genome  are  estimated  to  undergo  allele-specific 
methylation  of  which  some  may  differentially  affect  expression  (Schalkwyk 
et  al.  2010).  Methylation  of  key  tumor- suppressor  genes  has  been  identified  in  the 

ST 

lungs  of  smokers.  MGMT  (O  -methylguanine-DNA  methyltransf erase)  is  a  repair 
enzyme  whose  activity  is  thought  to  be  protective  against  the  mutagenic  effects  of 

ST 

carcinogens  that  induce  O  -methylguanines.  MGMT  is  silenced  via  promoter  meth- 
ylation in  several  cancers  (Hegi  et  al.  2008).  An  SNP  in  the  promoter  region  of 
MGMT,  r si 6906252,  is  associated  with  increased  MGMT  methylation  in  smokers 
and  in  colon  cancer  (Hawkins  et  al.  2009;  Leng  et  al.  2011).  Allele-specific 
methylation  has  been  observed  in  genes  with  other  roles  in  cancer.  For  example, 
allele-specific  methylation  of  ABCB1,  a  gene  whose  expression  is  implicated  in 
drug  resistance,  has  been  observed  in  breast  cancer  cell  lines  (Reed  et  al.  2010). 
Allele-specific  expression  can  also  be  attributed  to  allele-specific  difference  in 
chromatin  modifications.  Allele-specific  histone  modifications  have  been  identified 
in  humans  and  chromatin  states  have  been  observed  to  vary  between  families 
(Kadota  et  al.  2007;  Prendergast  et  al.  2012).  These  loci  tend  to  correlate  with 
regions  demonstrating  allele-specific  expression.  An  association  between  allele- 
specific  methylation  and  H3K27methylation  has  been  identified  (Statham 
et  al.  2012).  It  is  probable  that  future  studies  will  identify  genetic  variants 
associated  with  allele-specific  expression  via  modifications  to  the  chromatin  that 
will  affect  cancer  risk  or  development. 


9.3.3    Copy  Number  Variations  and  Germline  Mutations 
Leading  to  Aberrant  Methylation 

SNPs  are  not  the  only  type  of  genetic  variation  that  can  impact  the  epigenome. 
Copy  number  variations,  polymorphic  gains  and  losses  of  DNA,  can  also  impact 
normal  epigenetic  patterning  and  cancer  development.  Germline  mutations 
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resulting  in  loss  or  gain  of  large  segments  of  DNA  can  also  cause  abnormal 
epigenetic  regulation  of  genes  important  in  cancer  risk.  A  number  of  families 
with  a  clinical  diagnosis  of  Lynch  syndrome  were  identified  that  appeared  to 
have  familial  methylation  of  the  MSH2  gene  without  any  promoter  or  coding 
defects  (Chan  et  al.  2006).  One  explanation  of  these  results  was  that  there  was  a 
heritable  epimutation  or  allele- specific  silencing  of  the  MSH2  gene  in  the  absence 
of  DNA  alterations.  Further  analysis,  however,  revealed  that  deletions  of  the  last 
two  exons  of  the  EPCAM  gene,  just  upstream  of  MSH2,  induced  methylation  of 
MSH2  resulting  in  Lynch  syndrome  (Ligtenberg  et  al.  2009;  Niessen  et  al.  2009). 
Another  example  of  copy  number  deletions  influencing  colorectal  cancer  risk  is 
duplication  of  PTPRJ.  PTPRJ  has  been  implicated  as  a  candidate  colorectal  cancer 
susceptibility  gene  from  mouse  models  (Ruivenkamp  et  al.  2002).  A  170-kb 
intragenic  duplication  including  the  5'  end  of  the  PTPRJ  gene  and  resulting  in 
promoter  methylation  was  found  in  DNA  of  a  family  with  early-onset  CRC 
(Venkatachalam  et  al.  2010).  As  additional  studies  are  conducted,  it  is  likely  that 
additional  copy  number  variations  or  mutations  will  be  identified  that  result  in 
abnormal  epigenetic  patterning  and  increased  cancer  risk. 


9.3.4    Single-Nucleotide  Polymorphisms  and  microRNAs 

miRNAs  consist  of  18-23  nucleotides  that  contain  a  seed  region  of  6-9  nucleotides. 
The  seed  region  is  believed  to  be  critical  for  binding  to  target  mRNAs.  Thus,  DNA 
variations  that  occur  in  the  miRNA  or  the  seed  region  of  the  potential  target  have 
the  potential  to  disrupt  the  miRNA 's  ability  to  recognize  and  bind  to  its  targets 
(Landi  et  al.  2012a).  Conversely  SNPs  in  the  3'UTR  can  interfere  with  binding  of 
miRNAs.  There  are  several  examples  of  SNPs  in  miRNA-binding  sites  or  miRNAs 
themselves  that  have  been  associated  with  an  increase  in  cancer  risk  (Table  9.1) 
(Landi  et  al.  2012b).  Three  miRNAs,  miR-196a2,  miR-499,  and  miR-146a,  contain 
SNPs  in  their  pre-miRNA  that  are  thought  to  affect  processing  and  have  been 
associated  with  increased  risk  of  cancer.  In  2008,  rs2910164  in  prQ-miR-146a 
was  found  to  decrease  the  expression  of  mature  miR-146a  and  increase  the  risk  of 
developing  papillary  thyroid  cancer  (Jazdzewski  et  al.  2008).  It  was  later  associated 
with  an  increased  risk  of  prostate,  breast,  ovarian,  and  gastric  cancers  among  other 
cancer  types  (Shen  et  al.  2008;  Xu  et  al.  2010;  Zeng  et  al.  2010).  One  of  the  first 
genes  to  be  identified  with  a  3'UTR  variant  affecting  miRNA  binding  was  KRAS. 
KRAS  is  an  oncogene  which  is  mutated  at  a  high  frequency  in  cancers  of  the  colon, 
lung,  and  pancreas.  SNP  interferes  with  let-7  binding  to  KRAS  leading  to  an 
increase  in  KRAS  expression.  Rs67 164370  has  been  associated  with  an  increase 
in  risk  of  non- small-cell  lung  cancer  and  response  to  therapy  for  metastatic  colo- 
rectal cancer  (Chin  et  al.  2008;  Zhang  et  al.  2011b).  Another  interesting  gene 
containing  a  variant  reported  to  affect  miRNA  binding  is  SET 8.  SET8  is  responsible 
for  methylation  of  TP 53,  the  "guardian  of  the  genome"  which  regulates  genomic 
stability.  An  SNP  in  the  3'UTR  of  SET8  affects  binding  of  miR-502  resulting  in 
altered  SET8  expression.  Another  SNP,  r si  69 17 496,  is  associated  with  younger  age 
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Table  9.1  Germline  variants  affecting  miRNAs  or  miRNA  targets 


Location  of 
SNP 

SNPs 

Gene/miRNA 

Cancer  type 

References 

miRNA 

rs2910164 

miR-146a 

Prostate,  thyroid, 

Jazdzewski  et  al.  (2008),  Shen 

precursor 

breast,  ovar- 

et al.  (2008),  Xu  et  al.  (2010), 

ian,  gastric 

Zeng  et  al.  (2010),  Wang 

et  al.  (2012) 

miRNA 

rs2292832 

miR-149 

HNSCC 

Liu  et  al.  (2010) 

miRNA 

miR-499 

Breast 

Hu  et  al.  (2009) 

precursor 

miRNA 

rsll614913 

miR-196a2 

Breast,  lung 

Hu  et  al.  (2009),  Tian  et  al.  (2009) 

miRNA 

rsl  69 17496 

miR-5021 

Breast 

Song  et  al.  (2009) 

SET8 

miRNA 

rs4919510 

miR-608 

CRC  recurrence/ 

Lin  et  al.  (2012) 

death 

rs213210 

miR-219-1 

CRC  death 

Lin  et  al.  (2012) 

3'UTR 

rs67 164370 

Kras/LET-7 

Lung  cancer 

Chin  et  al.  (2008) 

3'UTR 

rsl421 

EPCAM/miR- 

Breast  cancer 

Jiang  et  al.  (2011) 

1183 

risk 

3'UTR 

rsl044219 

RY3/miR-367 

Breast  cancer, 

Zhang  et  al.  (2011a) 

survival 

3'UTR 

rs8126 

TNFAI2 

HNSCC 

Liu  et  al.  (2011) 

3'UTR 

IQGAP1I 

Gastric  cancer 

Zheng  et  al.  (2011) 

miR-124 

Promoter 

rs4938723 

miR-34b/c 

Hepatocellular 

Xuetal.  (2011) 

carcinoma 

3'UTR 

rs709805 

KIAA0182 

CRC 

Landi  et  al.  (2012b) 

3'UTR 

rs354476 

NUP210 

CRC 

Landi  et  al.  (2012b) 

3'UTR 

rs799917 

BRCAl/miR- 

Breast 

Nicoloso  et  al.  (2010) 

638 

3'UTR 

rs334248 

TGFRl/miR- 

Breast 

Nicoloso  et  al.  (2010) 

187 

HNSCC  head  and  neck  squamous  cell  carcinoma,  CRC  colorectal  cancer 


of  breast  cancer  onset  (Song  et  al.  2009).  In  addition  to  SNPs  in  the  miRNAs  or  their 
targets,  SNPs  in  promoter  regions  of  miRNAs,  such  as  miR-34b/c,  have  also  been 
associated  with  cancer.  These  studies  demonstrate  that  genetic  variants  can  have  a 
significant  impact  on  gene  regulation  via  miRNAs  and  influence  cancer  risk. 


9.4  Somatic  Mutations  and  Epigenetic  Patterning 
9.4.1    Epigenetic  Patterning  of  the  Cancer  Genome 

Although  many  studies  in  the  past  have  focused  on  specific  loci  or  genes  in  studying 
epigenetic  changes  in  cancer,  it  is  now  recognized  that  genome-wide  changes 
to  normal  epigenetic  patterning  during  cancer  development  are  just  as  frequent. 
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There  is  also  a  link  between  epigenetics  and  genomic  aberrations.  Hypomethylation 
of  the  genome  with  specific  hypermethylation  of  gene  promoters  is  typical  of 
several  cancers  (Ehrlich  2006).  Hypomethylation  of  the  genome  is  associated 
with  an  increase  in  DNA  instability  resulting  in  aneuploidy  and  the  potential  for 
additional  epigenetic  alterations  (Ehrlich  2009;  Rodriguez  et  al.  2006).  In  addition 
to  methylation  status,  chromatin  status,  specifically  H3K9me3  levels,  is  associated 
with  an  increase  in  the  mutation  rate  (Schuster-Bockler  and  Lehner  2012; 
Bartolomei  2009).  Thus,  genomic  changes  that  occur  during  tumorigenesis  reflect 
complex  interaction  between  the  epigenome  and  genome. 


9.4.2    Mutations  in  Epigenetic  Regulators 

Cancer  is  caused  by  a  series  of  mutations  characterized  by  loss-of-function 
mutations  of  tumor-suppressor  genes  such  as  TP53  and  activating  mutations  or 
amplification  of  oncogenes  such  as  KRAS.  Recently  mutations  in  epigenetic  regu- 
latory genes,  such  as  DNMT3A,  EZH2,  and  TET2,  have  also  been  demonstrated  to 
play  a  critical  role  in  tumorigenesis  (Chase  and  Cross  2011;  McCabe  et  al.  2012; 
Perez  et  al.  2012;  Shih  et  al.  2012).  EZH2  is  an  important  component  of  the 
polycomb  repressive  complex  2  (PRC2)  which  is  part  of  the  machinery  linking 
promoter  methylation  and  chromatin  silencing  by  catalyzing  H3-K27 
trimethylation.  B-cell  lymphomas  with  Y641F  mutations  in  EZH2  lead  to  increased 
levels  of  trimethylated  histone  H3K27,  a  marker  of  gene  silencing  (Yap  et  al.  201 1). 
Other  activating  mutations  of  EZH2  have  also  been  described.  DNMT3A  is  a 
critical  enzyme  important  in  the  transfer  of  methyl  groups  to  CpG  dinucleotides 
leading  to  methylation.  Whole-genome  sequencing  revealed  that  22  %  of  cases  of 
acute  myeloid  leukemia  patients  have  mutations  in  DNMT3A  (Ley  et  al.  2010). 
Mutations  in  this  gene  correlated  with  decreased  methylation  of  multiple  genes, 
although  the  exact  implications  of  the  induced  hypomethylation  are  not  yet  clear.  It 
is  likely  that  mutations  in  additional  genes  that  control  epigenetics  will  be  found  as 
genome  sequence  of  tumors  for  diagnostic  and  treatment  decisions  becomes  more 
routine.  From  what  we  know  thus  far,  it  is  clear  that  somatic  mutations  lead  to 
changes  in  epigenetic  patterning  on  a  genomic  level  and  vice  versa. 


9.4.3    Chromosomal  Breakpoints  and  Methylation 

Common  types  of  mutations  observed  in  cancer  are  chromosomal  rearrangements 
and  translocations.  Many  of  the  breakpoints  defining  these  regions  are  common 
both  between  and  within  cancer  types.  Thus,  there  has  been  much  speculation  about 
the  mechanism  or  the  features  of  DNA  sequence  that  characterize  breakpoint  hot 
spots.  One  study  of  over  100  breast  tumors  compared  copy  number  data  with 
methylation  profiling  and  identified  93  of  217  common  breakpoint  loci  as  being 
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differentially  methylated  in  tumors  (Tang  et  al.  2012).  The  majority  of  these  loci 
showed  hypomethylation  and  about  a  third  of  them  were  located  within  3  Mb  of  Alu 
SINE  repetitive  elements.  In  multiple  myeloma,  a  correlation  between  the  density 
of  LINE-1  elements  and  common  breakpoints  was  identified  (Aoki  et  al.  2012). 
In  addition,  the  relative  degree  of  methylation  of  these  LINE-1  elements  was 
associated  with  poorer  prognosis  in  this  disease.  Head  and  neck  cancers  positive 
for  human  papillomavirus  infection  also  show  a  strong  correlation  with 
hypomethylation  of  LINE  elements  and  loss  of  heterozygosity  (Richards 
et  al.  2009).  These  studies  further  support  the  link  between  hypomethylation  and 
genomic  instability  in  tumors. 


9AA    Somatic  Events  Leading  to  microRNA  Alterations 

miRNA  expression  is  frequently  perturbed  in  cancers.  Deletions  of  DNA  containing 
miRNAs  are  one  type  of  mutation  leading  to  aberrant  miRNA  expression. 
Translocations,  or  juxtaposition  of  two  different  chromosomal  regions,  have  also 
been  associated  with  alterations  in  miRNA  expression.  In  fact,  the  first  miRNAs 
associated  with  cancer,  miR-15  and  miR-16,  were  discovered  because  of  a  translo- 
cation at  13ql4  in  patients  with  chronic  lymphocytic  leukemia  that  led  to  a  small 
deletion  of  27  kb  in  a  region  devoid  of  genes  (Calin  et  al.  2002).  A  translocation 
between  chromosomes  15  and  17  is  found  in  a  subset  of  individuals  with  acute 
myeloid  leukemia.  These  translocations  are  associated  with  elevated  levels  of  miR- 
127,  miR-299,  miR-370,  miR-323,  and  miR-154  which  all  map  to  the  same  locus  on 
human  14q32  (Dixon-Mclver  et  al.  2008).  The  mechanism  for  this  increase  in 
expression  is  not  known,  but  one  hypothesis  is  that  the  translocation  leads  to  a 
change  in  methylation  or  acetylation  status  of  this  locus. 


9.5    Imprinting  and  Cancer 

Imprinting  is  the  differential  epigenetic  regulation  of  a  gene  which  is  dependent  on 
the  parent  from  which  the  gene  or  the  allele  is  inherited.  This  results  in  a  gene- 
dosage  effect  and  is  one  example  of  methylation  patterning  that  is  reset  early  in 
development.  Several  genomic  loci,  encompassing  approximately  100  genes,  are 
known  to  be  maternally  or  paternally  imprinted  and  additional  loci  have  recently 
been  described  as  being  imprinted  or  silenced  on  one  allele  in  human  and  mouse 
models  (Schuster-Bockler  and  Lehner  2012;  Xie  et  al.  2012).  Frequently,  imprinted 
genes  reside  in  clusters  that  contain  an  imprinting  control  region  that  shows  parental 
specific  methylation.  Loss  of  imprinting  (LOI)  or  abnormal  imprinting  of  some  of 
these  regions  has  been  linked  to  the  development  of  cancer. 
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One  of  the  first  loci  in  the  genome  identified  as  being  imprinted  is  lip  15. 5  (Zhang 
et  al.  1993;  Moulton  et  al.  1996).  Wilms'  tumor,  a  childhood  tumor,  is  associated 
with  LOI  of  the  maternally  expressed  noncoding  HI 9  gene  and  the  paternally 
expressed  insulin-like  growth  factor  two  gene  (IGF2)  on  chromosome  llpl5.5 
(Scott  et  al.  2008).  LOI  of  IGF2  is  associated  with  several  additional  cancers 
including  prostate,  colon,  and  gastric  (Cui  et  al.  2002,  2003;  Bhusari  et  al.  2011; 
Zuo  et  al.  201 1).  There  is  some  evidence  that  LOI  of  IGF2  occurs  in  multiple  tissues 
of  an  individual  suggesting  an  early  event  in  development  (Cruz-Correa 
et  al.  2004).  Other  developmental  disorders  associated  with  defects  in  imprinting 
or  LOI  include  Prader-Willi  syndrome  (PWS)  and  McCune- Albright  syndrome. 
PWS  is  caused  by  aberrant  imprinting  at  the  15ql  1—13  locus.  Cancers  associated 
with  PWS  include  myeloid  leukemias  (Davies  et  al.  2003).  The  risk  of  developing 
thyroid  cancer,  osteosarcoma,  skin  cancer,  and  neurofibromatosis  are  elevated  in 
individuals  with  McCune-Albright  syndrome  who  have  imprinting  alterations  for 
GNAS  on  chromosome  20ql3.2  (Chanson  et  al.  2007). 


9.5.2    Imprinting  of  Noncoding  Elements 

Genes  are  not  the  only  important  elements  in  the  genome  that  show  imprinting.  An 
estimated  7  %  of  known  human  miRNAs  map  to  imprinting  regions  (Labialle  and 
Cavaille  2011)  and  many  of  these  show  aberrant  expression  in  cancers  (Giradot 
et  al.  2012).  A  pilot  study  of  parent-child  trios  assessed  the  methylation  status  of 
LINE- 1  elements  in  the  blood  and  identified  a  high  correlation  of  methylation  levels 
between  mother-daughter  and  father-daughter  pairs,  but  only  a  weak  correlation 
between  mother-son  and  father-son  pairs  (Mirabello  et  al.  2010).  In  general,  males 
showed  a  higher  rate  of  LINE-1  methylation  compared  to  females.  Interestingly,  in 
father-son  pairs  in  which  both  the  father  and  son  had  a  diagnosis  of  testicular  germ 
cell  tumors,  there  was  a  correlation  of  methylation  levels  between  the  father  and 
son.  LINE-1  hypomethylation  showed  a  marginal  increase  in  the  risk  for  testicular 
germ  cell  tumors.  Mouse  studies  have  also  showed  a  link  between  inherited 
epigenetic  modifiers  in  the  risk  of  testicular  germ  cell  tumors  supporting  the  role 
of  familial  shared  epigenetic  patterning  in  susceptibility  to  this  cancer  (Lam 
et  al.  2007). 


9.5.3    Cancer  Risk  and  Aberrant  Imprinting 

Although  most  described  cases  of  aberrant  imprinting  and  cancer  are  associated 
with  syndromes,  there  are  recent  reports  of  parent  of  origin  effects  (POE)  associated 
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with  low-penetrance  susceptibility  alleles.  The  extent  of  POE  is  unknown  as  most 
genome-wide  association  studies  are  not  able  to  determine  which  allele  was 
inherited  from  which  parent  because  only  cases  and  controls  are  genotyped  and 
not  relatives  of  cases.  DeCode  Genetics  genotyped  a  large  percentage  of  the  fairly 
homogenous  Icelandic  population  which  allowed  reconstruction  of  common 
haplotypes  and  the  ability  to  generate  information  on  parent  of  origin.  A  few 
SNPs  evaluated  in  the  context  of  parental  haplotypes  show  apparent  allele- specific 
POE  risks  for  cancer.  One  example  is  the  C  allele  of  rs3817198  which  maps  to 
lip  15,  a  region  known  to  be  imprinted.  The  C  allele  is  not  associated  with  breast 
cancer  when  inherited  from  the  mother,  but  shows  a  modest  increase  in  risk  when 
inherited  from  the  father  (Kong  et  al.  2009).  Similarly,  the  T  allele  of  rsl57935  on 
7q32  is  associated  with  basal  cell  carcinoma  when  inherited  from  the  father,  but  not 
when  inherited  from  the  mother.  These  studies  suggest  that  being  able  to  determine 
the  parent  of  origin  of  alleles  may  uncover  low-penetrance  risk  alleles.  Whereas  the 
mechanism  of  the  risk  being  dependent  on  parent  of  origin  is  assumed  to  be 
epigenetic,  this  has  not  been  fully  evaluated. 

Sporadic  cancers  are  not  the  only  ones  that  exhibit  POE.  Germline  mutations  in 
the  genes  for  subunits  of  enzymes  in  the  respiratory  chain  complex  II  (succinate-u- 
biquinone oxidoreductase,  succinate  dehydrogenase)  can  lead  to  hereditary 
paraganglioma/pheochromocytoma  (PGL/PCC)  which  are  highly  vascularized 
tumors  of  the  paraganglia  (Fishbein  and  Nathanson  2012).  Five  different  types  of 
hereditary  PGL/PCC  have  been  described.  Whereas  PGL  types  3,  4,  and  5  are 
inherited  as  autosomal  dominant  traits  with  mutations  in  SDHC,  SDHB,  and  SHDA, 
PGL  types  1  and  2  only  occur  when  SDHD  or  SDHAF2  mutations  are  inherited 
from  the  father.  This  suggests  imprinting  or  silencing  of  the  maternal  allele. 
However,  studies  to  examine  methylation  status  and  expression  patterns  of  SDHD 
have  not  uncovered  the  mechanism  for  the  POE  (Muller  201 1).  In  tumors  loss  of  the 
maternal  allele  is  required  suggesting  that  there  is  some  contribution  of  the  maternal 
allele  in  suppressing  tumorigenesis.  A  model  for  partial  epigenetic  silencing  of  the 
maternal  allele  has  been  proposed  but  the  mechanism  for  this  has  not  been 
established. 

The  penetrance  of  hereditary  syndromes,  such  as  Lynch  syndrome,  may  also 
depend  on  the  parent  transmitting  the  mutation  (van  Vliet  et  al.  2011).  A  study  of 
over  400  carriers  of  mutations  in  mismatch  repair  genes,  MLH1  ,MSH2,  and  MSH6, 
showed  that  the  average  age  of  diagnosis  for  male  carriers  or  obligate  carriers  with 
maternally  derived  mutations  was  6  years  earlier  than  those  whose  mutation  was 
paternally  derived.  This  observation  also  extended  to  higher  CRC  incidences  in 
males  whose  mutations  were  inherited  from  their  mothers.  There  was  no  difference 
seen  in  females  whose  mutations  came  from  their  mother  versus  their  father.  Other 
studies  have  found  that  the  effect  was  seen  in  female  offspring  (Lindor  et  al.  2010). 
Whether  these  effects  are  epigenetic  or  due  to  in  utero  environmental  exposures  has 
not  been  determined. 

The  retinoblastoma  or  RBI  gene  is  a  classic  tumor  suppressor  that  is  inactivated 
through  mutations  or  DNA  deletions  in  many  tumors.  Inherited  mutations  in  the  Rb 
tumor-suppressor  gene  lead  to  hereditary  retinoblastoma,  although  variability  in 
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penetrance  in  age  of  diagnosis  has  been  noted  in  some  families  (Harbour  2001).  In 
2009,  a  CpG  island  in  intron  2  of  RBI  was  found  to  be  preferentially  methylated  on  the 
maternally  inherited  chromosome  and  unmethylated  on  the  paternally  inherited  chro- 
mosome (Kanber  et  al.  2009).  RBI  shows  alternative  splicing  and  intron  2  acts  as  a 
promoter  for  one  of  these  splice  forms,  but  only  on  the  chromosome  inherited  from  the 
father.  The  alternative  splice  form  is  not  expressed  from  the  maternal  chromosome. 
The  clinical  implications  of  this  imprinting  and  splice-form  variation  are  not  known, 
but  it  may  impact  some  of  the  differences  observed  within  families  with  germline 
mutations. 


9.6    Transgenerational  Epigenetic  Events 

Transgenerational  epigenetic  events  refer  to  the  persistence  of  an  epigenetic  state 
through  the  germline  to  manifest  in  the  next  generation  (Fleming  et  al.  2008). 
Abnormal  epigenetic  silencing  or  activation  of  genes  was  not  thought  to  be 
inherited  through  the  germline  because  during  the  formation  of  gametes  and 
again  post-fertilization,  epigenetic  marks  including  DNA  methylation  and  histones 
are  erased  and  reprogrammed  (Migicovsky  and  Kovalchuk  2011).  Germline  DNA 
variants  can  confer  differences  in  epigenetic  regulation  through  effects  on  methyl- 
ation and  miRNA  binding  which  can  be  inherited.  There  is  also  evidence  from 
animal  and  plant  models  that  epigenetic  effects  can  escape  reprogramming  and 
persist  through  the  germline  in  the  absence  of  known  DNA  sequence  alterations 
(Niksson  et  al.  2012).  In  C.  elegans  piRNAs  can  induce  stable  long-term  epigenetic 
silencing  of  genes  that  can  persist  over  many  generations  (Ashe  et  al.  2012).  This 
has  not  been  shown  for  humans,  but  is  one  potential  mechanism  for  the  observation 
of  transgenerational  epigenetic  modifications  that  persist  for  multiple  generations. 

In  humans,  the  evidence  for  multigenerational  epigenetic  inheritance  not  based 
on  DNA  sequence  is  less  clear.  To  date,  studies  to  test  this  hypothesis  are  difficult  as 
any  transgenerational  epigenetic  effects  need  to  persist  beyond  the  grandparent-to- 
grandchild  transmission.  This  is  because  environmental  triggers  or  exposure  in  a 
parent  have  been  shown  to  induce  epigenetic  changes  in  the  developing  fetus  and  its 
gametes  which  could  mimic  a  hereditary  change  (Skinner  2008).  There  are  some 
reports  of  hereditary  epigenetic  silencing  of  MLH1  leading  to  Lynch  syndrome,  a 
hereditary  colorectal  cancer  syndrome,  which  persists  through  multiple  generations 
(Gazzoli  et  al.  2002;  Suter  et  al.  2004;  Hitchins  et  al.  2005;  Hitchins  and  Ward 
2009;  Goel  et  al.  2011;  Crepin  et  al.  2012).  One  mechanism  for  this  was  identified 
as  a  variant  in  the  5'UTR  of  MLH1  (Hitchins  et  al.  201 1)  suggesting  that  hereditary 
MLH1  methylation  in  these  families  is  due  to  differences  in  the  DNA  code. 
Epigenetic  silencing  of  KILLIN  in  lymphocytes  is  associated  with  early-onset 
cancer  and  features  of  Cowden  syndrome,  but  this  observation  has  not  been  noted 
in  multiple  generations  (Bennett  et  al.  2010).  As  the  animal  data  for 
transgenerational  effects  on  cancer  risk  is  convincing,  it  is  likely  that  researchers 
will  obtain  evidence  for  this  phenomenon  in  humans  as  well. 
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9.7  Summary 


Approximately  30  years  ago,  aberrant  DNA  methylation  was  first  described  in 
cancer.  Since  then  changes  in  DNA  methylation,  chromatin  modifications,  expres- 
sion of  noncoding  RNAs,  and  higher  level  organization  of  DNA  in  the  nucleus  have 
all  been  shown  to  be  involved  in  promotion  of  key  cellular  and  stromal  changes  that 
convert  a  cell  to  a  tumor  cell.  Despite  the  classical  definition  of  epigenetics  as  being 
independent  of  DNA  sequence,  DNA  sequences  influence  specific  epigenetic 
events.  Germline  and  somatic  mutations  and  sequence  variations  have  been 
demonstrated  to  affect  the  normal  epigenetic  state  through  a  variety  of  mechanisms 
and  in  doing  so  can  lead  to  increases  in  cancer  risk  and  tumor  promotion  (Fig.  9.1). 
As  our  understanding  of  less  well -characterized  epigenetic  processes  such  as 
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Fig.  9.1  Genetic  influences  on  epigenetic  patterning  in  cancer.  Some  of  the  different  germline  and 
somatic  influences  on  epigenetic  patterning  that  lead  to  an  increase  in  cancer  risk  or  which  promote 
tumorigenesis  are  illustrated.  Filled  circles,  methylated  CpGs;  open  circles,  unmethylated  CpGs, 
TSG,  tumor-suppressor  gene;  del,  deletion 
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chromosome-chromosome  interactions,  transgenerational  epigenetic  effects, 
noncoding  RNAs,  and  nuclear  positioning  becomes  deeper,  we  will  more  fully 
understand  the  role  of  the  genome  and  our  individual  genetic  differences  in  shaping 
our  epigenome  and  cancer  development. 
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Abstract  Familial  aggregation  of  complex  diseases  may  have  many  causes  in 
addition  to  and  apart  from  genetic  predisposition  due  to  common  ancestry.  For 
example,  exposure  to  an  environment  that  induces  susceptibility  to  a  disease  may 
produce  similar  familial  aggregations  when  the  environment  is  shared  by  family 
members.  In  general,  according  to  the  principles  of  (Johannsen  1903),  the  emer- 
gence of  a  disease  phenotype  is  the  result  of  the  combined  effects  of  the  genotype  of 
the  individual  and  the  environment  that  it  experiences  during  development.  The 
heritability  of  a  disease  is  a  measure  of  familial  aggregation  in  terms  of 
the  covariances  among  family  members  relative  to  the  variance  in  disease  state  in 
the  general  population.  Thus  heritability  expresses  the  within-family  resemblance, 
observed  by  Darwin  and  inferred  by  him  to  reflect  inheritance,  which  formed  the 
core  of  his  (Darwin  1859)  theory  of  evolution.  Darwin's  inspiration  originated  from 
the  practical  use  of  family  resemblance  in  animal  breeding.  Animal  breeders  have 
long  known  that  a  major  obstacle  to  progress  in  genetic  improvement  is  the 
interaction  between  familial  aggregation  of  environments  and  the  effects  of  similar 
genetics  within  families.  The  potential  importance  of  this  interaction,  recognized  in 
classical  studies  of  the  genetic  epidemiology  of  complex  diseases  and  other  quanti- 
tative characters,  has  reemerged  in  studies  of  the  effects  of  epigenetic 
modifications,  their  variation,  and  their  transmission  between  generations. 


R.E.  Furrow  (M)  •  M.W.  Feldman 

Department  of  Biology,  Stanford  University,  Stanford,  CA  94305,  USA 
e-mail:  rfurrow@stanford.edu 

F.B.  Christiansen 

Department  of  Bioscience  and  Bioinformatics  Research  Center,  University  of  Aarhus, 
DK-8000  Aarhus  C,  Denmark 

A.K.  Naumova  and  C.M.T.  Greenwood  (eds.),  Epigenetics  and  Complex  Traits,  233 
DOI  10.1007/978-l-4614-8078-5_10,  ©  Springer  Science+Business  Media  New  York  2013 


234 


R.E.  Furrow  et  al. 


10.1  Introduction 

Familial  aggregation  of  complex  diseases  may  have  many  causes  in  addition  to  and 
apart  from  genetic  predisposition  due  to  common  ancestry.  For  example,  exposure 
to  an  environment  that  induces  susceptibility  to  a  disease  may  produce  similar 
familial  aggregations  when  the  environment  is  shared  by  family  members.  In 
general,  according  to  the  principles  of  Johannsen  (1903),  the  emergence  of  a  disease 
phenotype  is  the  result  of  the  combined  effects  of  the  genotype  of  the  individual  and 
the  environment  that  it  experiences  during  development.  The  heritability  of  a 
disease  is  a  measure  of  familial  aggregation  in  terms  of  the  covariances  among 
family  members  relative  to  the  variance  in  disease  state  in  the  general  population. 
Thus  heritability  expresses  the  within-family  resemblance,  observed  by  Darwin  and 
inferred  by  him  to  reflect  inheritance,  which  formed  the  core  of  his  (Darwin  1859) 
theory  of  evolution.  Darwin's  inspiration  originated  from  the  practical  use  of  family 
resemblance  in  animal  breeding.  Animal  breeders  have  long  known  that  a  major 
obstacle  to  progress  in  genetic  improvement  is  the  interaction  between  familial 
aggregation  of  environments  and  the  effects  of  similar  genetics  within  families.  The 
potential  importance  of  this  interaction,  recognized  in  classical  studies  of  the 
genetic  epidemiology  of  complex  diseases  and  other  quantitative  characters,  has 
reemerged  in  studies  of  the  effects  of  epigenetic  modifications,  their  variation,  and 
their  transmission  between  generations. 

Epigenetic  modification  patterns  are  known  to  exhibit  dependence  on  the  envi- 
ronment to  which  an  organism's  cells  have  been  exposed  (e.g.,  Skinner  2011). 
During  ontogenesis  epigenetic  status  evolves,  and  the  status  of  gene  action  within 
the  cells  of  an  organism  is  a  function  of  its  epigenetic  status.  In  addition,  the 
external  environment  of  the  organism  may  influence  the  presence  or  absence  of 
specific  epigenetic  signals  in  one  or  more  tissues  (Carone  et  al.  2010;  Heijmans 
et  al.  2008;  Kucharski  et  al.  2008;  McGowan  et  al.  2009;  Ng  et  al.  2010;  Sandovici 
et  al.  2011;  Tobi  et  al.  2009;  Verhoeven  et  al.  2010;  Waterland  and 
Jirtle  2003;  Weaver  et  al.  2004).  As  with  the  effects  of  radiation  exposure,  the 
full  epigenetic  impact  of  an  exposure  to  a  specific  environment  may  take 
generations  to  emerge.  A  pregnant  woman  for  example  carries  three  generations 
of  genomes  (Fig.  10.1),  as  a  result  of  which  there  may  be  apparent  inheritance  of 
the  effects  of  environmental  exposures  over  several  generations.  Skinner  (2011) 
described  such  an  epigenetic  effect  due  to  exposure  to  the  fungicide  vinclozolin. 
During  gestation  an  exposed  mother  can  transmit  the  causative  agent  to  her  fetus, 
resulting  in  epigenetic  modifications.  It  also  influences  the  germline  contributing  to 
her  future  grandchildren.  The  exposures  of  the  mother  and  fetus  are  truly  simulta- 
neous, and  so  are  the  exposures  of  the  eggs  of  a  female  fetus  and  the  germline  stem 
cells  of  a  male  fetus.  In  this  case,  transmission  of  induced  epigenetic  modifications 
may  reveal  itself  three  generations  after  the  exposure  ceases. 

Familial  aggregation  of  epigenetic  modifications  may  thus  emerge  from  two 
different  sources,  namely  through  the  transmission  of  epigenetic  marks  and  because 
of  shared  familial  environments  experienced  by  individuals  in  the  population. 
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Fig.  10.1  Three  generations  of  cells  within  one  individual.  Within  a  pregnant  woman  are  the  cells 
of  her  fetus,  in  addition  to  the  germ  cells  of  the  fetus  itself 

These  are  not  mutually  exclusive.  Transmission  of  epigenetic  modifications  in  terms 
of  methylation  status  of  CpG  sites  is  well  understood  in  somatic  cells.  Similarities  of 
epigenetic  modifications  among  parents  and  their  progeny  are  known,  but  seem  at 
variance  with  the  widespread  removal  of  methylation  from  CpG  sites  in  the  early 
mammalian  embryo.  Nevertheless  familial  aggregations  of  methylation  patterns 
have  been  observed,  and  recently  incomplete  resetting,  transposon  effects,  and 
small  RNA  molecules  have  been  observed  to  play  a  role  in  the  transgenerational 
memory  of  epigenetic  states  (reviewed  in  Daxinger  and  Whitelaw  2012). 

Future  studies  of  the  biological  bases  for  a  phenotype  may  be  able  to  include  data 
from  many  levels  between  genetics  and  a  specific  phenotype,  including  gene 
expression  and  epigenetics.  Recent  work  developing  an  integrated  view  of  geno- 
mics has  found  widespread  interaction  between  the  genome  and  these  various  levels 
(ENCODE  Project  Consortium  et  al.  2012).  Better  understanding  of  epigenetic 
inheritance,  both  theoretically  and  empirically,  should  help  our  understanding  of 
this  complex  web  of  interactions. 


10.2    Modeling  Epigenetic  Inheritance  in  a  Population 

A  small  but  growing  body  of  literature  constructs  and  analyzes  models  of  epigenetic 
inheritance.  Several  of  these  contributions  focus  on  epigenetic  impacts  on  pheno- 
typic inheritance  (Slatkin  2009;  Danchin  and  Wagner  2010;  Tal  et  al.  2010;  Furrow 
et  al.  2011;  Day  and  Bonduriansky  2011).  All  of  these  models  are  built  around  the 
same  fundamental  concepts:  an  epigenetic  state  can  be  thought  of  as  similar  to  a 
genetic  or  phenotypic  trait,  and  a  particular  site  in  the  genome  can  be  in  one  of  a  set 
of  possible  epigenetic  states.  For  example,  a  cytosine  at  a  particular  CpG  site  in  the 
genome  can  be  methylated  or  unmethylated.  Or  the  state  in  question  could  be  the 
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presence  or  absence  of  a  small  interfering  RNA  or  other  micro  RNAs.  The  models 
therefore  consider  one  or  more  epigenetic  sites  and  are  completed  by  specifying  the 
mode  of  inheritance,  described  by  parameters  quantifying  the  probabilities  that  an 
epigenetic  site  switches  from  one  state  to  another  between  generations.  These 
switching  probabilities  may  depend  on  the  current  state,  the  genotype,  and  the 
environmental  condition,  although  simplifications  are  usually  needed  for  clear 
analysis.  When  the  focus  of  a  model  is  epigenetic  inheritance,  the  main  emphasis 
of  analysis  is  on  the  contrast  between  comparable  models  of  epigenetic  and  genetic 
inheritance. 

For  example,  Slatkin  (2009)  considered  an  arbitrary  number  of  epigenetic  loci, 
where  the  epigenetic  states  were  transmitted  like  genetic  states,  except  that  the  rates 
of  epigenetic  switching  were  higher  than  those  typical  of  genetic  mutation.  The 
model  assumed  that  the  particular  epigenetic  state  at  each  of  the  loci  affected  the 
overall  disease  risk  of  an  individual.  Slatkin  concluded  that  higher  rates  of  epige- 
netic switching  between  generations  greatly  reduced  the  contribution  of  the  varia- 
tion at  these  epigenetic  loci  to  heritability.  Similarly,  the  analysis  of  Tal 
et  al.  (2010)  focused  largely  on  the  consequences  of  nontrivial  rates  of  epigenetic 
switching  (which  they  called  resetting),  in  a  manner  analogous  to  genetic  mutation. 

The  switching  probabilities  in  an  epigenetic  model  can  be  viewed  as  describing 
the  inheritance  of  epigenetic  modifications  or  as  rates  analogous  to  those  of  genetic 
mutations.  In  models  of  genetic  inheritance,  the  environment  has  no  effect  on 
mutation  rates  unless  mutagenic  agents  are  explicitly  included.  But  it  is  well 
understood,  in  a  variety  of  organisms,  that  environmental  conditions  such  as  stress, 
diet,  and  temperature  can  influence  the  epigenetic  states  of  individuals  (Carone 
et  al.  2010;  Heijmans  et  al.  2008;  Kucharski  et  al.  2008;  McGowan  et  al.  2009; 
Ng  et  al.  2010;  Sandovici  et  al.  2011;  Tobi  et  al.  2009;  Verhoeven 
et  al.  2010;  Waterland  and  Jirtle  2003;  Weaver  et  al.  2004).  Furthermore,  the 
rates  of  "epimutation" — the  probability  of  an  epigenetic  state  switching  during 
transmission  between  generations — can  be  several  orders  of  magnitude  higher 
than  those  expected  for  a  genetic  locus. 

The  correlation  between  the  epigenetic  state  of  a  parent  and  its  offspring  will 
depend  on  the  environments  experienced  by  the  individuals.  Figure  10.2  shows  how 
adults  and  their  adult  offspring  may  have  high  correlation  in  epigenetic  state,  even 
if  there  is  significant  resetting  at  some  point  during  gametogenesis.  If  adults  in  a 
given  environment  produce  offspring  that  are  likely  to  remain  in  the  same 
environment  experienced  by  the  adults,  then  they  may  have  the  same  epigenetic 
marks  reinduced  despite  low  fidelity  transmission  of  the  epigenetic  states  to  the 
zygotes  produced  by  these  adults. 

General  application  of  these  population-based  epigenetic  inheritance  models  is 
currently  limited  by  a  shortage  of  transgenerational  epigenetic  data.  Johannes  and 
Colome-Tatche  (2011),  however,  used  inheritance  data  from  an  experiment  with 
the  flowering  plant  Arabidopsis  thaliana,  where  epigenetic  variation  had  been 
induced  while  genetic  variation  was  reduced  (Johannes  et  al.  2009).  Crosses  were 
made  between  inbred  lines  with  minimal  genetic  divergence  but  differences  in  CpG 
methylation  rates.  Their  data  analysis  rested  on  a  quantitative  genetic  model 
incorporating  epigenetic  variation,  switching,  and  transgenerational  transmission. 
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Fig.  10.2  The  combined  effects  of  low  fidelity  epigenetic  transmission  and  shared  environmental 
effects.  Five  individuals  are  shown  in  two  sections  of  a  stratified  population  living  in  environments 
that  cause  different  epigenetic  modifications.  Reset  at  the  site  with  red  modifications  is  complete 
whereas  two-third  of  the  blue  sites  are  transmitted.  Although  the  epigenetic  marks  may  be  lost 
during  production  of  the  fetus,  if  an  offspring  is  likely  to  experience  a  similar  environment  to  its 
parents,  then  the  shared  environmental  influence  on  rates  of  epigenetic  switching  can  produce 
familial  correlations  in  epigenetic  state 

They  found  high  heritability  in  various  quantitative  traits  despite  the  lack  of  genetic 
variance,  and  their  analysis  yielded  estimates  of  the  number  of  epigenetic  sites 
contributing  to  variation. 

10.3    Epigenetics  and  Phenotypic  Heritability: 
Theoretical  Considerations 

To  connect  transgenerational  epigenetic  inheritance  to  phenotypic  correlations 
between  relatives,  we  must  specify  the  influence  of  the  epigenetic  states  on  the 
phenotype  in  question.  For  a  continuous  quantitative  trait,  an  epigenetic  state  may 
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Fig.  10.3  Faithful  and  unfaithful  epigenetic  transmission.  AZ?<9V£:  Faithful  transmission  of 
an  epigenetic  modification  {red  mark)  is  illustrated  in  three  crosses  each  with  eight  offspring. 
The  epigenotype  of  a  focal  parent  is  shown,  while  the  other  parent  is  anonymous  {gray).  Below. 
The  epigenetic  modification  considered  {blue  mark)  may  be  lost  between  generations,  or  may  arise 
de  novo  in  the  offspring.  In  this  case  there  is  no  parent-offspring  resemblance  in  the  population 


have  some  quantifiable  influence,  similar  to  the  effect  of  allelic  variation  at  a 
genetic  locus  in  quantitative  genetics  (for  example,  as  in  Tal  et  al.  2010).  For  a 
discrete  phenotype  the  epigenetic  state  can  influence  a  continuous  trait  such  as  the 
risk  of  a  particular  disease  (as  in  Slatkin  2009;  Furrow  et  al.  2011)  or  some 
underlying  liability  trait  that  causes  a  different  phenotype  if  the  liability  passes 
some  threshold.  The  amount  of  epigenetic  contribution  to  heritability  of  the  pheno- 
type in  question  depends  on  the  relative  importance  of  epigenetic  versus  genetic  or 
environmental  effects,  and  the  magnitude  of  the  co variance  between  the  epigenetic 
states  of  parents  and  their  offspring.  As  such,  we  focus  on  the  factors  influencing  the 
familial  co  variance  in  epigenetic  state. 

Whatever  the  dynamic  model  we  use  to  describe  epigenetic  transmission 
between  generations,  the  epigenetic  effect  on  heritability  boils  down  to  this  covari- 
ance.  The  covariance  is  not  a  parameter  of  the  dynamic  model  but  a  derived 
statistical  property  of  the  system,  and  a  high  covariance  can  occur  in  many  different 
ways.  For  example,  an  epigenetic  state  that  is  transmitted  very  faithfully  between 
generations  may  result  in  a  high  covariance,  particularly  when  there  is  substantial 
epigenetic  variation  in  the  population  (Fig.  10.3).  Or  a  high  covariance  may  result 
when  different  environmental  states  strongly  induce  particular  epigenetic  states, 
and  the  environments  of  parent  and  offspring  are  highly  correlated  (Fig.  10.2).  In 
the  first  case,  our  epigenetic  site  behaves  like  a  genetic  locus  with  different  alleles. 
But  in  the  second,  the  epigenetic  state  is  a  manifestation  of  a  heritable  environment. 
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The  epigenetic  state  need  not  be  transmitted  meiotically  with  high  fidelity,  as  long 
as  the  induction  of  epigenetic  changes  is  similar  between  parent  and  offspring. 

Suppose  that  an  individual's  phenotype,  P,  can  be  expressed  in  terms  of  an 
epigenetic  contribution  Q  (borrowing  notation  from  Johannes  et  al.  2009),  and  a 
non-epigenetic  contribution  N.  The  non-epigenetic  contribution  may  include  both 
genetic  and  environmental  effects.  If  we  use  the  subscript  A  for  a  focal  adult  and 
O  for  one  of  its  offspring,  then  we  can  express  an  adult  phenotype  and  that  of  one  of 
its  offspring  as 

PA=QA+NA    and  P0=Q0+N0, 
and  express  the  covariance  between  the  two  individuals  as 

Cov(PA,P0)  =  Cov(QA  +NA,Q0  +N0) 

=  Cov(QA,Q0)  +  Cov(7VA,Q0)  +  Cov(QA,7V0)  +  Cov(NAlN0). 

(10.1) 

In  this  framework,  phenotypic  similarities  between  relatives  could  stem  purely 
from  shared  genetics  or  environment,  captured  in  the  term  Cov(A^,  NQ).  There  may 
also  be  epigenetic  contributions  purely  through  direct  transmission,  Cov(i2A,  QQ). 
But  epigenetic  contributions  to  phenotypic  covariance  may  also  stem  from 
associations  between  environments  or  genetics  and  epigenetic  states  (the  terms 
Cov(NA,  QQ)  +  Cov(i2A,  N0))-  Note  that  Eq.  (10.1)  does  not  explicitly  incorporate 
any  interaction  terms  between  epigenetics,  genetics,  and  the  environment.  It  simply 
allows  for  covariances  between  the  different  phenotypic  contributions  across 
generations. 

Initial  theoretical  work  focused  on  epigenetic  inheritance  without  incorporating 
variation  in  the  environment.  In  this  case,  epigenetic  variation  is  nothing  more  than 
genetic  variation  with  high  mutation  rates.  Accordingly,  such  studies  compared 
Cov(i2A,  QQ)  and  Cov(NA,  NQ)  and  found  that  the  fidelity  of  epigenetic  transmis- 
sion between  generations  crucially  determined  the  epigenetic  contribution  to  heri- 
tability. Models  of  Slatkin  (2009)  and  Tal  et  al.  (2010)  considered  one  or  more 
epigenetic  sites,  with  a  fixed  probability  that  the  state  will  be  reset  during  transmis- 
sion from  parent  to  offspring.  By  varying  the  parameter  corresponding  to  rate 
of  reset,  they  found  that  the  epigenetic  contribution  to  heritability  decreased  quickly 
as  resetting  increased.  Only  very  low  epigenetic  reset  rates  would  lead  to  a  signifi- 
cant epigenetic  contribution  to  the  heritability  of  a  phenotype  (Slatkin  2009;  Tal 
et  al.  2010).  Some  empirical  research  has  suggested  that  transgenerational  inheri- 
tance of  epigenetic  states  is  unlikely  in  mammals,  due  to  the  widespread  loss  of 
methylation  and  other  markers  in  the  genome  during  gametogenesis,  but  exceptions 
to  this  effect  have  recently  been  noted  (reviewed  in  Daxinger  and  Whitelaw  2012). 

If  the  environment  can  influence  rates  of  epigenetic  switching,  shared 
environments  can  mimic  the  effects  of  faithful  meiotic  transmission  of  epigenetic 
states.  Furrow  et  al.  (2011)  modeled  such  a  population  and  found  several  different 
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scenarios  in  which  epigenetic  inheritance  could  significantly  contribute  to 
phenotypic  heritability:  (1)  when  rates  of  epigenetic  reset  were  low  between  parent 
and  offspring,  (2)  when  environmental  correlations  were  high  between  family 
members,  and  (3)  when  both  reset  rates  and  environmental  correlations  were 
intermediate  between  parent  and  offspring.  This  model  tracked  individuals'  epige- 
netic states,  environmental  states,  and  the  states'  transmission  to  offspring.  The 
process  of  reproduction  allowed  for  meiotic  reset  of  states,  in  addition  to  induction 
of  epigenetic  state  changes  due  to  environmental  influence,  as  demonstrated  in 
Fig.  10.2.  If  rates  of  epigenetic  reset  were  low  regardless  of  the  environment,  then 
the  Furrow  et  al.  (201 1)  model  was  effectively  identical  to  that  of  Slatkin  (2009)  or 
Tal  et  al.  (2010).  But  the  possibility  of  environmental  correlations  between  parent 
and  offspring  allowed  for  new  routes  to  high  familial  aggregations  of  epigenetic 
state.  With  high  environmental  correlations,  and  environmental  states  that  caused 
different  rates  of  epigenetic  induction,  correlations  between  the  epigenetic  states  of 
parent  and  offspring  could  be  high.  These  epigenetic  correlations  were  especially 
high  when  individuals  mated  assortatively  with  respect  to  the  environmental  state 
they  experienced.  Furthermore,  high  contributions  to  heritability  did  not  even 
require  low  epigenetic  reset  or  high  environmental  correlations.  Partial  environ- 
mental correlations  in  conjunction  with  moderately  faithful  meiotic  epigenetic 
transmission  also  yielded  high  epigenetic  correlations  between  parent  and  off- 
spring. Note  that  this  model  actually  focused  entirely  on  influences  contained 
with  the  first  summand  in  Eq.  (10.1):  there  was  no  direct  environmental  influence 
on  phenotype  and  hence  no  TV  term.  In  models  that  allow  this  environmental 
influence  as  well,  we  can  expect  environmental  correlations  to  strengthen  the 
epigenetic  influence  on  phenotypic  heritability  even  further.  It  would  be  possible 
to  have  significant  contributions  to  phenotypic  covariance  from  all  four  of  the 
summands  in  Eq.  (10.1).  In  this  case,  epigenetic  contributions  to  phenotypic  heri- 
tability are  better  seen  as  the  combined  effects  of  partial  meiotic  transmission  and 
shared  environments  between  generations.  This  insight  applies  equally  well  to 
classical  analyses  of  cultural  inheritance  in  circumstances  where  the  environment 
cannot  be  controlled — which  is  the  rule  whenever  human  populations  are 
investigated  (Cavalli-Sforza  and  Feldman  1973,  1981;  Feldman  and  Cavalli- 
Sforza  1979;  Feldman  et  al.  1995,  2000;  Otto  et  al.  1995). 

Although  it  is  never  trivial  to  evaluate  the  environmental  influence  in  such 
models,  organisms  amenable  to  experimentation  can  shed  light  on  the  possibilities. 
A  study  by  Verhoeven  et  al.  (2010)  on  apomictic  dandelions  tracked  the  effects  of 
environmental  stresses.  Epigenetic  changes  were  observed  in  the  exposed 
individuals,  and  the  homologous  epigenetic  sites  were  observed  in  genetically 
identical  offspring.  In  response  to  stress,  in  particular  chemical  induction  of  herbi- 
vore or  pathogen  defences,  plants  showed  widespread  epigenetic  changes  in  the 
genome,  and  many  of  these  changes  persisted  in  offspring  raised  from  seeds  in  an 
environment  lacking  the  stress.  Such  observations  may  allow  estimates  of 
environment-dependent  epimutation  rates,  but  because  the  phenomena  may  reflect 
a  combination  of  many  of  biological  processes,  the  conclusions  drawn  from  one 
species  may  not  apply  to  distantly  related  species. 
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10.4    Epigenetics  and  Phenotypic  Heritability: 
Examples  in  Human  Populations 

Very  little  work  has  simultaneously  tracked  environments,  epigenetics,  and 
phenotypes  in  human  populations.  However,  human  populations  have  occasionally 
experienced  strong,  unexpected  stresses  that  offer  some  insight  into  the  potential 
interactions  among  environment,  epigenetics,  and  phenotype.  The  stress  of  a 
famine,  for  instance,  can  strongly  influence  the  phenotypes  of  individuals  gestating 
or  born  during  such  a  period.  During  World  War  II,  the  combination  of  a  freezing 
cold  winter  and  the  German  blockade  of  parts  of  the  Netherlands  led  to  a  famine  in 
the  winter  of  1944-1945,  a  period  referred  to  as  the  Dutch  Hongerwinter. 
Individuals  prenatally  exposed  to  this  famine  showed  a  significant  increase  in 
their  risk  of  acquiring  schizophrenia  (Susser  and  Lin  1992;  Susser  et  al.  1996), 
and  even  decades  later,  these  individuals  show  epigenetic  differences  from  their 
same  sex  siblings  (Heijmans  et  al.  2008),  although  the  effect  appears  to  depend  on 
both  sex  and  gestational  timing  (Tobi  et  al.  2009).  A  spike  in  schizophrenia  risk 
occurred  among  individuals  conceived  during  the  Chinese  famine  of  1959-1961  as 
well  (St  Clair  et  al.  2005).  Obesity  and  other  health  risks  are  also  associated  with  the 
Dutch  Hongerwinter  (Roseboom  et  al.  2006),  and  similar  effects  are  associated  with 
the  famine  in  Biafra  during  the  Nigerian  civil  war  of  1967-1970  (Hult  et  al.  2010). 

Studies  in  other  mammals  have  demonstrated  gestational  dietary  effects  on  the 
epigenetic  profile  of  offspring.  A  study  of  sheep  found  that  maternal  undernutrition 
was  associated  with  epigenetic  changes  in  both  CpG  methylation  and  histone 
modifications  in  fetal  hypothalamic  pathways  (Begum  et  al.  2012).  Methylation 
rates  in  mice  are  affected  by  both  maternal  (Waterland  and  Jirtle  2003)  and  paternal 
diet  (Carone  et  al.  2010;  Ng  et  al.  2010),  and  paternal  effects  in  mammals  have  been 
found  to  influence  even  the  development  of  grand-offspring  (Curley  et  al.  2011). 
It  seems  likely  that  at  least  some  of  the  observed  human  phenotypic  effects  of 
the  famines  mentioned  above  may  be  mediated  by  environmentally  induced  epige- 
netic changes,  as  appears  to  be  the  case  with  other  examples  of  stress  in 
humans  (Borghol  et  al.  2012;  McGowan  et  al.  2009;  Tyrka  et  al.  2012;  Uddin 
et  al.  2010;  Waterland  et  al.  2010). 


10.5    Epigenetics  and  Phenotypic  Evolution 

Another  suite  of  models  has  focused  on  the  evolutionary  consequences  of  epige- 
netic inheritance  due  to  adaptive  variation  in  epigenetic  ally  controlled  phenotypes 
that  may  vary  between  generations  (Lachmann  and  Jablonka  1996;  Thattai  and 
van  Oudenaarden  2004;  Kussell  and  Leibler  2005;  Salathe  et  al.  2009;  Gaal 
et  al.  2010;  Feinberg  and  Irizarry  2010;  Day  and  Bonduriansky  2011;  Liberman 
et  al.  2011;  Carja  and  Feldman  2012;  Geoghegan  and  Spencer  2012).  In  essence, 
these  models  extend  investigations  into  phenotypic  heritability  by  considering 
fitness  as  the  phenotype  in  question.  Some  models  focus  on  epigenetic  sources  of 
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fitness  variation,  while  others  consider  the  possibility  of  interaction  between  effects 
of  epigenetic  and  genetic  variation. 

Day  and  Bonduriansky  (2011)  showed  in  a  simple  model  of  genetic  and  epige- 
netic interaction  that  rates  of  epigenetic  reset  between  generations  determine 
whether  a  population  will  evolve  toward  an  equilibrium  with  epigenetic  variation 
where  the  effects  of  selection  balance  with  the  rates  of  epigenetic  reset,  or  toward  an 
equilibrium  in  which  there  is  no  epigenetic  variation.  Geoghegan  and  Spen- 
cer (2012)  found  that  there  may  be  more  than  one  equilibrium  state  stable  in  the 
population,  i.e.,  different  populations  may  show  different  levels  of  epigenetic 
variation  without  any  differences  in  the  environment  or  epigenetic  transmission. 
In  model  populations  where  the  rate  of  epigenetic  switching  can  evolve,  the 
evolutionarily  stable  epimutation  rate  will  be  related  to  the  rate  at  which  the 
environment  changes.  In  all  of  these  models  the  rates  of  epigenetic  reset  between 
generations  (or  the  resulting  correlation  between  epigenetic  state  of  parent  and 
offspring)  determine  the  evolutionary  dynamics  of  the  epigenetically  induced 
phenotypic  variation,  in  particular  the  levels  of  variation  at  equilibrium  with  respect 
to  selection  and  epigenetic  reset  between  generations. 

The  level  of  phenotypic  variation  originating  from  epigenetic  variation  is 
maintained  by  the  combination  of  environmental  effects,  selection,  and  rates  of 
epigenetic  reset  between  generations.  However,  the  contribution  of  epigenetic 
variation  to  phenotypic  inheritance  is  more  narrowly  determined  by  the  direct 
transmission  of  epigenetic  states  to  offspring  and  by  the  covariance  between 
non-epigenetic  influences  and  epigenetic  influences  on  the  phenotypes  of  parents 
and  their  offspring.  The  levels  of  epigenetic  variation  in  the  population  may 
therefore  be  a  weak  indicator  of  the  basis  for  phenotypic  evolution,  in  particular 
for  adaptation  in  response  to  selection.  However,  a  study  of  a  flowering  plant  found 
associations  between  browsing  damage  on  the  individual  and  its  epigenetic  status 
(Herrera  and  Bazaga  2011),  and,  within  yeast  populations,  niche  breadth  and 
epigenetic  variation  were  found  to  be  correlated  (Herrera  et  al.  2012).  These  studies 
suggest  a  possible  role  for  epigenetic  variation  in  short-term  adaptation. 

Although  these  results  are  promising,  it  is  unclear  whether  a  phenotype  favored 
in  a  new  environment  will  continue  to  increase  in  frequency  for  more  than  a  few 
generations  after  the  population  arrived  in  that  environment.  An  equilibrium 
maintained  by  a  balance  between  selection  and  epigenetic  dynamics  may  be  far 
from  the  phenotypic  optimum  reached  by  analogous  genetic  systems,  causing  an 
"epimutational  load"  on  the  population.  The  evolutionary  dynamics  of  environ- 
mentally induced  epigenetic  modifications  may  therefore  be  consistent  with  their 
ability  to  cause  disease. 


10.6    Measuring  Epimutation  Rates 

One  of  the  major  challenges  in  empirical  studies  of  epigenetic s  in  the  next  decade 
will  be  the  measurement  of  epigenetic  switching  rates  and  their  variation  through  an 
organism's  lifespan  and  across  different  environments  and  tissues,  for  all  types  of 
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epigenetic  marks  (Fraga  et  al.  2005).  Models  of  both  evolution  and  phenotypic 
inheritance  show  that  the  fidelity  of  epigenetic  transmission  between  generations  is 
a  critical  parameter  in  inferences  concerning  either  familial  correlations  or  dynam- 
ics of  adaptation.  However,  estimates  of  these  rates  are  likely  to  depend  on  the  age 
of  the  organisms.  Inheritance  of  an  epigenetic  state  from  an  adult  to  its  juvenile 
offspring  necessarily  focuses  on  direct  meiotic  transmission  and  effects  of  shared 
environment  early  in  life,  but  fails  to  illuminate  the  effects  of  shared  environment 
that  may  be  manifest  on  phenotypes  measured  later  in  life.  This  potentially  leads  to 
an  underestimate  of  the  importance  of  epigenetic s  in  the  inheritance  of  adult 
phenotypes,  and  to  poor  separation  of  the  effects  of  direct  transmission  and  com- 
mon environment  in  early  life. 


10.7  Conclusion 

Humans  vary  phenotypically  due  to  many  factors,  some  of  which  may  be  heritable, 
and  some  not.  Epigenetic  contributions  to  heritable  phenotypic  variation  may  stem 
from  both  genetic  and  environmental  variation,  in  addition  to  random  changes  in 
the  epigenetic  state  at  some  genomic  positions.  Epigenetic  states  may  be  directly 
transmitted  through  meiosis,  thus  mimicking  genetic  transmission  with  a  high 
mutation  rate.  However,  similar  environments  of  parents  and  offspring  may  also 
produce  phenotypic  parent-offspring  correlations  in  epigenetic  states.  In  the  study 
of  transgenerational  epigenetic  inheritance  care  must  therefore  be  taken  to  consider 
both  the  effects  of  direct  transmission  and  those  of  the  purely  statistical  associations 
between  environment  and  epigenotype  that  emerges  because  of  environmental 
influences  on  epigenotype.  Both  routes  to  transgenerational  epigenetic  inheritance 
have  the  potential  to  influence  adaptation  and  phenotypic  heritability,  and  both 
of  these  phenomena  may  explain  some  of  the  heritability  not  accounted  for  in 
genome-wide  association  studies  of  complex  phenotypes  and  diseases  (Eichler 
et  al.  2010;  Goldstein  2009;  Maher  2008;  Petronis  2010). 

Epigenetic  variation  also  offers  a  source  of  heritable  phenotypic  variation  upon 
which  natural  selection  can  act.  Epigenetic  variation  in  a  population  should  evolve 
toward  a  balance  between  selection  and  epigenetic  modifications.  But  high  rates  of 
epigenetic  modification,  in  an  analogy  to  high  mutation  rates,  will  resist  the 
adaptive  push  of  natural  selection.  For  epigenetic  modifications  to  allow  adaptation 
in  static  environments,  we  expect  that  the  rates  of  epigenetic  change  at  a  site  will  be 
very  low,  akin  to  typical  rates  of  genetic  mutation. 

As  a  body  of  theory  is  built,  studies  must  revisit  classic  population  genetics 
results,  while  relaxing  assumptions  about  the  magnitude  of  mutation  rates  and  the 
influence  of  the  environment  on  inheritance  processes.  Models  of  cultural  evolution 
may  also  offer  insights  into  the  possible  role  of  epigenetic  inheritance  in  phenotypic 
evolution.  At  the  same  time,  empirical  research  will  clarify  reasonable  parameter 
ranges  and  assumptions.  Only  through  the  iterative  process  of  improving  both 
models  and  empirical  methods  we  can  begin  to  understand  the  full  role  of  epige- 
netic variation  in  phenotypic  heritability  and  evolution. 
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Chapter  11 

Statistical  Approaches  for  Detecting 
Transgenerational  Genetic  Effects 
in  Humans 

Janet  S.  Sinsheimer  and  Michelle  M.  Creek 


Abstract  Transgenerational  genetic  effects  occur  when  the  genes  of  one 
generation  influence  the  phenotype  of  subsequent  generations  without  Mendelian 
transmission  of  alleles,  possibly  through  inherited  epigenetic  effects.  The  evidence 
for  transgenerational  genetic  effects  in  humans  comes  predominantly  from  genetic 
epidemiology  studies,  which  thus  presents  a  number  of  statistical  challenges  to  their 
analysis  and  interpretation.  In  this  chapter,  we  outline  some  of  the  genetic  epide- 
miologic study  designs  and  statistical  analysis  approaches  that  have  been  used  to 
detect  these  effects  and  discuss  their  strengths  and  weaknesses. 

11.1  Introduction 

Genetic  epidemiology  concentrates  on  disease  risks  due  to  a  subject's  own  genes  and 
environment.  Although  we  gain  much  etiological  insight  from  these  studies,  many 
genetic  determinants  of  disease  remain  undiscovered.  One  possibility  is  that 
transgenerational  genetic  effects  play  a  role  in  their  etiology.  Transgenerational 
genetic  effects  occur  when  the  genes  of  one  generation  influence  the  phenotype  of 
subsequent  generations  without  Mendelian  transmission  of  alleles  (Fig.  11.1),  pos- 
sibly through  inherited  epigenetic  effects  (Gluckman  et  al.  2007;  Nadeau  2009). 
Most  commonly,  these  transgenerational  genetic  effects  are  parental  genes  having 
an  effect  on  their  offspring's  phenotype,  but  more  distant  ancestors  can  have  effects. 
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Fig.  11.1  Schematic  depiction  of  potentially  detectable  effects.  Disease  risk  in  the  offspring 
(denoted  by  the  dark  circle)  can  be  due  to  maternal  (M),  paternal  (P),  offspring  (Off),  parent  of 
origin  (PoO),  long-range  transgenerational  (LRT-M  maternal  side,  LRT-P  paternal  side), 
maternal-fetal  genotype  incompatibility  (MFG),  or  paternal-fetal  genotype  incompatibility 
(PFG)  effects.  These  effects  are  not  mutually  exclusive.  The  effects  could  be  genetic  or  environ- 
mental in  origin.  When  genetic  in  origin,  they  have  direct  effects  or  are  epigenetic  or  environmen- 
tally mediated 

Less  commonly  considered,  an  offspring's  genotype  could  also  elicit  a  phenotype  in 
his/her  mother.  In  this  chapter,  we  explore  statistical  approaches  for  detecting 
transgenerational  genetic  effects  in  humans  with  genetic  epidemiological  data. 
With  a  few  exceptions,  these  studies  provide  only  indirect  evidence  consistent 
with  epigenetic  phenomena  and,  in  some  of  these  cases,  the  underlying  explanations 
for  transgenerational  genetic  effects  will  not  involve  modifications  to  DNA  or 
chromatin.  However,  we  argue  these  studies  provide  an  excellent  starting  point  for 
hypothesis  generation  and  for  further  investigations  leading  to  more  direct  tests 
for  epigenetic  effects. 

To  clarify  what  we  mean  by  transgenerational  genetic  effects,  we  first  provide 
descriptions  of  some  common  ones  before  going  on  to  describe  appropriate  study 
designs  and  analysis  approaches  to  detect  them.  For  notational  convenience  we 
drop  the  "genetic"  and  from  now  on  refer  to  them  collectively  as  transgenerational 
effects.  Figure  11.1  illustrates  some  transgenerational  effects  discussed  in  this 
chapter  including  maternal  effects  (M),  paternal  effects  (P),  parent  of  origin  effects 
(PoO),  maternal-fetal  genotype  (MFG)  incompatibility,  paternal-fetal  genotype 
(PFG)  incompatibility,  and  long-range  transgenerational  effects  from  the  maternal 
or  paternal  side  (LRT-M  or  LRT-P). 

Maternal  genetic  effects  and  paternal  genetic  effects.  There  are  observations  that 
particular  maternal  genotypes  are  strongly  associated  with  offspring  phenotypes, 
regardless  of  what  alleles  the  offspring  inherits  (Kistner  and  Weinberg  2004;  Wheeler 
and  Cordell  2007;  Weinberg  et  al.  1998).  Although  discussed  less  often,  paternal 
genotypes  are  also  associated  with  offspring  phenotype.  A  possible  mechanism  for 
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these  effects  includes  parental  genotype  influences  on  the  offspring's  environment. 
This  environmental  effect  could  have  no  effect  on  the  offspring's  DNA  or  chromatin 
or  it  could  induce  epigenetic  modifications.  Alternatively,  parental  genotype  could 
directly  affect  the  offspring's  epigenome  or  the  effects  could  be  due  to  maternally 
derived  mitochondrial  DNA. 

Parent  of  origin  effects.  PoO  effects  occur  when  the  degree  of  association  of  an 
allele  with  an  offspring's  phenotype  depends  on  the  sex  of  the  transmitting  parent 
(in  Fig.  11.1,  the  effects  of  the  paternally  transmitted  allele  prevail).  Although 
silencing  through  methylation  or  histone  modification,  commonly  referred  to  as 
imprinting,  is  one  form  of  PoO  effects,  other  mechanisms  can  also  lead  to  these 
effects  (Guilmatre  and  Sharp  2012).  These  mechanisms  include  mutational  trans- 
mission bias  and  oocyte  RNAs  or  proteins. 

MFG  incompatibility.  The  effects  of  maternal  genes  on  the  offspring's  disease  risk 
may  vary  depending  on  the  offspring's  genotype  (Fig.  11.1).  MFG  incompatibilities 
are  gene  interactions  that  produce  adverse  effects  on  the  developing  fetus.  These 
gene-gene  interactions  differ  from  typical  ones  because  maternal  genes  interact 
with  offspring  genes.  MFG  incompatibilities  are  involved  in  complex  diseases, 
even  adult  onset  diseases  where  the  effects  may  not  evident  until  long  after  the 
MFG  incompatibility  initiated  event  has  occurred  and  subsided  (Palmer  et  al.  2002, 
2006;  Sinsheimer  et  al.  2003).  In  principle,  there  could  be  paternal-offspring  gene 
interactions  (PFG  incompatibility),  although  there  is  less  biological  support  for 
these  interactions  than  for  MFG  incompatibility. 

Like  PoO  and  maternal  effects,  the  mechanisms  by  which  MFG  or  PFG 
incompatibilities  occur  could  be  methylation  or  chromatin  modification  but  other 
mechanisms  are  possible.  The  prototypical  MFG  incompatibility  is  RHD  incom- 
patibility, which  can  lead  to  erythroblastosis,  liver  damage,  hypoxia,  or  death  from 
hemolytic  disease  of  the  newborn  (HDN)  (Guyton  1981).  The  biological  mecha- 
nism underlying  RHD -induced  HDN  is  well  known  (Stratchen  and  Reed  2003)  and 
we  provide  a  simplified  description.  Alleles  at  the  RHD  locus  are  classified  into  two 
types,  D  and  d.  The  D  allele  codes  for  an  antigen  on  the  erythrocyte  surface  and  the 
d  allele  is  a  null  allele.  -induced  HDN  occurs  when  a  mother  with  a  null  allele 
homozygous  genotype  {did)  mounts  an  IgG  alloimmune  response  to  her  dID 
offspring's  erythrocytes,  damaging  their  ability  to  carry  oxygen  and  releasing 
bilirubin.  Maternal-fetal  ABO  incompatibility  leads  to  HDN  by  a  similar  mecha- 
nism (Guyton  1981).  RHD  and  ABO  incompatibilities  are  implicated  as  risk  factors 
for  complex  diseases  (Cannon  et  al.  2002;  Dahlquist  et  al.  1999;  Hollister 
et  al.  1996;  Insel  et  al.  2005;  Juul-Dam  et  al.  2001;  Kraft  et  al.  2004;  Palmer 
et  al.  2002;  Stubbs  et  al.  1985).  Although  RHD  incompatibility  involves  the  same 
locus  in  mother  and  offspring,  MFG  incompatibilities  can  also  occur  between  one 
locus  in  the  mother  and  another  locus  in  the  offspring  (Chen  et  al.  2009). 

Long-range  transgenerational  (LRT)  effects.  It  is  difficult  to  distinguish  epigenetic 
effects  from  shared  environment  unless  the  transgenerational  effect  persists  over 
multiple  generations  but  the  environmental  exposure  does  not.  Environmental 
exposures,  even  if  they  are  short  lived,  can  effect  three  generations  without 
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involving  specific  inherited  epigenetic  factors.  If  a  pregnant  woman  is  exposed  to  an 
environmental  stimulus,  she,  her  fetus,  her  gametes,  and  her  fetus'  gametes  can  be 
directly  affected  without  involving  epigenetic  modifications.  Likewise  environment 
can  affect  two  generations  when  a  man  is  exposed  because  his  gametes  can  be 
affected.  LRT  genetic  effects  caused  by  inherited  epigenetic  effects  are  well 
documented  in  model  organisms  (Nadeau  2009),  and  there  is  evidence  of  their  role 
in  common  diseases  in  humans  (e.g.,  Benyshek  et  al.  2001;  Klip  et  al.  2002). 


11.2    Study  Designs 

In  this  section,  we  discuss  epidemiological  study  designs  used  to  detect 
transgenerational  effects.  Most  statistical  approaches  to  study  these  effects  have 
been  designed  for  bivariate,  qualitative  traits.  Therefore,  when  we  discuss  specific 
study  designs  and  analyses,  we  concentrate  on  these  bivariate  traits  and  refer  to  cases 
and  controls.  When  methods  for  continuous  traits  are  commonplace  we  discuss  them 
in  the  analysis  section  along  with  the  appropriate  modifications  to  the  study  design. 

Case -mother,  control -mother  (CMCM).  The  CMCM  design  allows  detection  of 
offspring  genetic  effects,  maternal  genetic  effects,  and  their  interactions  by  com- 
paring the  genotype  distributions  of  affected  individuals  and  their  mothers  to  the 
genotype  distributions  of  unaffected  individuals  and  their  mothers  (Ainsworth 
et  al.  2011).  This  design  is  an  extension  of  the  popular  case-control  design  of 
genome-wide  association  and  is  subject  to  the  same  limitations,  such  as 
confounding  from  population  substructure. 

Case-parent  trios  (CPTs).  CPTs  were  first  popularized  in  genetics  to  avoid  the 
problems  of  population  substructure  that  originally  plagued  case-control  genetic 
analysis  (e.g.,  Laird  and  Lange  2010).  Although  this  advantage  is  largely  eliminated 
by  methods  that  control  for  ancestry  in  case-control  studies  (e.g.,  Edwards  and  Gao 
2012),  CPTs  are  popular  for  detecting  transgenerational  effects  associated  with 
disease  (Cordell  2004;  Cordell  et  al.  2004;  Laird  and  Lange  2010).  Using  CPTs 
expands  the  genetic  models  that  can  be  considered  over  using  CMCMs.  For  example 
one  can  test  for  PoO  and  paternal  effects.  For  a  bi-allelic  locus  there  are  15  possible 
maternal-paternal-offspring  genotype  combinations.  Case-mother  and  case-father 
duos  can  be  included  along  with  the  CPTs  by  treating  the  duos  as  trios  with  randomly 
missing  data  (Sinsheimer  et  al.  2003;  Weinberg  et  al.  1998).  These  models  can  be 
modified  to  include  control-mother  duos  or  parents  of  unaffected  offspring 
(Vermeulen  et  al.  2009)  but  then  population  stratification  comes  back  into  play. 

Nuclear  families.  The  CPT  design  can  be  extended  to  include  unaffected  and  affected 
siblings  of  the  case  (Kraft  et  al.  2004).  These  extensions  provide  additional  power  and 
increase  the  genetic  models  that  can  be  considered  but  may  require  additional  modeling 
assumptions  or  else  be  biased.  As  we  discuss  in  the  statistical  analysis  section,  the  study 
design  dictates  the  questions  that  can  be  posed  as  well  as  the  assumptions  imposed. 
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General  pedigree  data.  Intuitively,  the  ideal  study  design  for  detecting 
transgenerational  effects  allows  simultaneous  analysis  of  unrelated  individuals, 
small  pedigrees,  and  large  pedigrees.  The  inclusion  of  large,  multigenerational 
pedigrees  provides  a  way  to  study  a  variety  of  complex  patterns  and  detect  LRT 
effects.  Being  able  to  analyze  all  family  members  is  highly  efficient.  Breaking  up 
large  pedigrees  into  subsets  can  introduce  bias  (Childs  et  al.  2010,  201 1).  Pedigrees 
provide  a  way  to  model  phenotype  data  in  the  absence  of  genotype  data  (see  the 
statistical  analysis  section).  One  disadvantage  is  that,  depending  on  the  research 
question,  using  multigenerational  families  requires  more  restrictive  modeling 
assumptions  to  be  computationally  feasible. 


11.3    Statistical  Analysis  Approaches  for  Detecting 
Transgenerational  Effects 

We  briefly  outline  some  statistical  analysis  approaches.  Because  the  approaches 
depend  on  the  available  data,  we  group  them  by  data  type:  (1)  phenotype  data  only, 
(2)  phenotype  and  genotype  data,  and  (3)  phenotype,  genotype,  and  epigenetic  data. 


11.3.1    Approaches  Using  Only  Phenotype  Data 

Prior  to  the  wide  spread  availability  of  genotype  data,  evidence  supporting  the 
existence  of  transgenerational  effects  in  humans  came  from  the  inference  of 
phenotypic  inheritance  patterns  inconsistent  with  Mendelian  inheritance.  These 
approaches  generally  require  large  pedigrees  to  be  effective  but,  with  marked 
environmental  exposures,  following  matrilineal  or  patrilineal  lines  provides  evi- 
dence of  transgenerational  effects  (e.g.,  Gluckman  et  al.  2007). 

Indirect  evidence  for  transgenerational  effects  can,  in  principle,  be  obtained 
from  analyzing  pedigrees  with  complex  segregation  analyses  (e.g.,  Khoury 
et  al.  1993).  These  analyses  use  correlations  among  family  members'  phenotypes 
to  infer  the  existence  of  major  genes  acting  in  a  Mendelian  manner,  polygenes, 
shared  environment,  and  independent  environment  (residual  effects).  Generational 
differences  and  birth  order  effects  can  be  inferred.  The  number  of  effects  inferred  is 
dependent  on  the  variety  of  relationships  and  so,  in  general,  large  pedigrees  are 
needed  to  adequately  explore  transgenerational  effects.  The  biggest  difficulties  with 
this  approach  are  the  equivalence  or  near  equivalence  of  sets  of  models  and  the 
inability  to  prove  any  model  to  be  true. 

Variance  component  analysis  (e.g.,  Lange  2002)  and  its  related  approach,  path 
analysis  (e.g.,  Thomas  2004)  have  been  used  to  separate  genetic  sources  of  pheno- 
typic variation  from  other  sources.  In  the  absence  of  genetic  marker  data,  what  is 
not  the  effect  of  a  gene  or  genes  (possibly  many)  acting  in  a  Mendelian  fashion  is 
typically  assumed  to  be  environmentally  induced.  These  approaches  postulate  trait 
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n  i:  :i  x  x  x 

A9*      A10*     An*     A12*     A13*  A14* 

Fig.  11.2  Detailed  identity  states  used  in  the  variance  component  analyses.  Six  detailed  identity 
states  describe  the  IBD  sharing  between  two  non-inbred  individuals  i  and  j.  Four  circles  in  a  block 
represent  the  genes  from  /  and  j  at  a  single  locus.  Individual  /' s  genes  are  on  the  top  and  /  s  genes 
are  on  the  bottom.  Maternally  derived  genes  are  on  the  left,  paternally  derived  genes  on  the  right. 
Lines  between  genes  represent  genes  that  are  identical  by  descent.  The  probability  of  observing 
state  k  is  denoted  by 

variation  is  partitioned.  Classically,  genetic  effects  are  modeled  as  many  genes 
acting  approximately  equally  and  independently  (polygenes).  The  additive  genetic 
variance  captures  the  effect  of  alleles  at  these  genes  as  if  they  were  acting  indepen- 
dently, deviation  from  allelic  independence  leads  to  dominance  genetic  variation. 
The  genetic  correlation  between  two  relatives'  trait  values  depends  on  the  expected 
distribution  of  genes  shared  identically  by  descent  (IBD)  among  them  (Fig.  11.2). 
Environmental  variation  is  postulated  to  come  in  two  forms:  shared  and  indepen- 
dent among  the  pedigree  members.  Shared  environmental  variation  captures  the 
additional  correlations  among  family  members  that  remain  unexplained  by  IBD 
sharing.  Adopted  relatives  and  other  unrelated  individuals  living  in  the  same 
household  help  to  distinguish  these  correlations  from  genetic  correlations.  Indepen- 
dent environmental  variation  is  any  residual  variation  in  the  trait  values  after 
accounting  for  genetic  and  shared  environmental  variations.  It  is  always  included 
because  measurement  errors  make  the  model  fit  imperfect. 

Although  it  is  possible  to  use  variance  component  models  to  test  for 
transgenerational  effects  with  only  phenotypes,  this  approach  has  not  been  pursued  to 
any  appreciable  extent.  One  reason  is  that  shared  environment  and  transgenerational 
effects  are  often  confounded,  making  inference  of  transgenerational  effects  difficult. 
Parent  of  origin  effects  provide  an  exception  (Gorlova  et  al.  2007;  Zhou  et  al.  2011). 
PoO  effects  lead  to  a  difference  in  parent-offspring  correlations  depending  on  the 
parent's  sex  and  thus  are  accommodated  by  partitioning  the  additive  genetic  variance 
into  two  separate  effects.  When  there  are  no  shared  environmental  effects  but  there  are 
parent  of  origin  effects,  the  variance  covariance  matrix  for  family  phenotypes  Y  can  be 
written  as: 

Var(7)  =  (A;  +  A^a  +  (a;  +  A^)^a 

+  (2A*2  +  A*3  +  A*4)covmpa  +  A;^  +  A*2covd  +  /<£ 

where  cr^a  is  the  maternal  additive  genetic  variance,  a2w  is  the  paternal  additive 
genetic  variance,  covmpa  is  the  additive  covariance  of  the  maternal  and  paternal 
alleles,  o\  is  the  dominance  genetic  variance,  covd  is  the  dominance  covariance,  /  is 
the  identity  matrix,  o\  is  the  independent  environmental  variance,  and  A,  is  the 
probability  of  IBD  state  i  (Fig.  11.2). 
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Testing  for  maternal  genetic  effects  with  variance  component  models  presents  a 
problem,  especially  when  attempting  to  dissect  prenatal  and  postnatal  effects  from 
maternally  inherited  effects.  This  dissection  is  important  because  if  prenatal  effects 
are  not  properly  modeled,  heritability  estimates  are  biased  and  may  lead  to  false 
inference  of  PoO  effects  (Zhou  et  al.  2011).  In  the  absence  of  measured  predictors 
(e.g.,  genotypes),  it  is  impossible  to  estimate  all  three  effects  in  traditional  nuclear 
families  because  the  mother  providing  the  genetic  material  (genetic  mother)  is  the 
same  person  carrying  the  child  (gestational  mother)  and  the  same  person  raising  the 
child  (postnatal  mother).  Although  animal  experimentation  provides  opportunities 
to  dissect  these  effects  by  embryo  transplantation  and  cross  fostering  (e.g.,  Nadeau 
2009),  in  humans  the  options  are  limited.  Adoption  studies  have  been  used  to 
separate  out  postnatal  effects  but  prenatal  and  maternal  inherited  effects  are  still 
confounded.  Comparing  the  offspring  of  sisters  to  the  offspring  of  brothers  separates 
maternal  inheritance  from  prenatal  and  postnatal  effects  (Robson  1955).  However, 
recent  advances  in  assisted  reproductive  technologies  (ART)  provide  ways  to 
separate  all  these  effects  (Thapar  et  al.  2007;  Zhou  et  al.  2011). 

Because  ART  uses  sperm  donation,  egg  donation,  or  gestational  surrogacy 
depending  on  approach,  children's  genetic  parents  can  be  different  from  their 
prenatal  or  postnatal  parents.  By  comparing  phenotypes  within  and  between  these 
families,  it  is  possible  to  separate  maternal  genetic,  prenatal,  and  postnatal  effects 
(Zhou  et  al.  2011).  The  key  is  to  modify  equation  (11.1)  by  adding  in  a  prenatal 
household  matrix  //pre,  where  offspring  born  to  the  same  gestational  mother  are 
indicated  and  a  postnatal  household  matrix  Hpost  where  members  with  common 
environmental  exposures  are  indicated. 

Var(F)  =  (A;  +  A*w)cT2ma  +  (A;  +  A^)^  +  (2A*2  +  A*3  +  A*4)COVmpa 

*   2  *  2  2  2  \  / 

+  A9<7d  +  A\2COVd  +  IOQ  +  #pretfpre  +  ^post^pOSf 

Phenotypes  from  all  the  parents  and  offspring  in  ART  families  can  be  used  to 
estimate  these  effects.  Known  risk  factors  are  included  as  fixed  covariates.  Zhou 
(Zhou  et  al.  2011)  demonstrates  this  approach  with  nuclear  families,  and  note  that 
variance  component  models  can,  in  principle,  use  ART  pedigrees  of  any  complexity. 


11.3.2    Approaches  Using  Phenotype  and  Genotype  Data 

When  genotypes  are  available,  the  possibilities  for  detecting  transgenerational 
effects  improve.  Genotypes  provide  a  causal  anchor  and  allow  dissection  of  genetic 
effects  from  environmental  effects.  Tests  of  the  association  of  parental  genotypes 
with  offspring  phenotype  and  interactions  between  parent  and  offspring  genotypes 
are  possible.  Direct  evidence  for  the  epigenetic  mechanisms  underlying  these 
transgenerational  effects  is  not  obtained  from  these  studies;  however,  support  for 
persistent,  shared  environment  can  be  reduced  or  eliminated. 
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Table  11.1  Joint  maternal-offspring  genotype  relative  risks 
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aThe  first  two  columns  denote  genotypes  of  the  mother  and  her  offspring 

bColumns  3  and  4  denote  two  different  parameterizations  of  the  most  general  model  of 

maternal-offspring  genotype  effects  that  can  be  used  with  a  bi-allelic  locus 

cColumn  5  models  RHD  incompatibility 

dColumn  6  models  NIMA  and  offspring  main  effects 

e8//  is  the  joint  effect  of  i  maternal  2  alleles  and  7  offspring  2  alleles 

Sj  is  the  main  effect  of  i  maternal  2  alleles  Ry  is  the  main  effect  of  j  offspring  2  alleles  and  ytj  is  the 
additional  interaction  effect 

Before  embarking  on  a  discussion  of  the  specific  analysis  approaches,  we  note 
that  most  statistical  approaches  assume  that  the  genotype  is  a  SNP.  Thus  we  will 
also  focus  on  a  single  bi-allelic  locus.  Readers  should  be  aware  however  there  are  a 
few  methods  that  allow  multi-allelic  and  multi-locus  genotype  data  (e.g.,  Chen 
et  al.  2009;  Childs  et  al.  2011;  Hsieh  et  al.  2006a;  Sinsheimer  et  al.  2003). 

CMCM  data  can  be  summarized  in  a  two  factor  contingency  table  and  analyzed 
as  chi- square  or  using  a  Fisher  exact  test.  Analysis  of  CMCM  data  can  incorporate 
covariates  affecting  disease  susceptibility  by  using  logistic  regression.  The 
combinations  of  mother-offspring  genotypes  represent  levels  of  one  factor  and 
case-control  status  represents  levels  of  a  second  factor.  Thus  these  data  can  be  used 
to  test  models  regarding  maternal  genotype  main  effects,  offspring  genotype  main 
effects,  and  their  interactions.  Under  the  null  hypothesis  of  no  effect  of  this  locus  on 
disease  susceptibility,  the  two  factors  are  independent  and  the  genotype  frequencies 
for  cases  should  be  the  same  as  the  genotype  frequencies  for  controls. 

The  number  of  levels  for  the  first  factor  depends  on  whether  the  maternal  and 
offspring  SNPs  are  at  the  same  locus  or  two  distinct  loci.  When  the  maternal  and 
offspring  loci  are  distinct,  there  are  nine  possible  maternal-offspring  combinations. 
Chen  et  al.  (2009)  proposed  a  likelihood  ratio  test  that  allows  the  inclusion  of 
mother's  and  offspring's  genotypes  at  both  these  loci  and  showed  that  including 
both  increases  overall  information  and  thus  increases  power.  When  considering  the 
same  bi-allelic  locus  for  mother  and  offspring,  the  number  of  maternal-offspring 
genotype  combinations  is  seven  (Table  11.1).  Assumptions  regarding  the 
mechanisms  by  which  these  genotypes  lead  to  disease  or  how  the  maternal  and 
offspring  genotypes  interact  result  in  restrictions  on  the  levels,  further  reducing  the 
number  of  independent  parameters.  If  paternal-offspring  interactions  are  suspected, 
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the  CMCM  design  can  be  changed  to  a  case-father  control-father  (CFCF)  design 
and  analyzed  in  the  same  manner. 

Inference  from  CMCM  studies  is  sensitive  to  population  stratification.  Although 
this  can  be  corrected  by  accounting  for  maternal  population  history,  CMCM 
designs  are  limited  in  the  hypotheses  that  can  be  tested.  One  alternative  is  to  use 
case-parent  trios.  Several  analysis  approaches  have  been  used  to  analyze  trios 
depending  on  the  research  questions  under  consideration. 

Parent  of  origin  effects  can  be  detected  using  the  transmission  disequilibrium 
test  (TDT)  (Spielman  et  al.  1993;  Terwilliger  and  Ott  1992).  The  TDT  is  a  form  of 
conditional  logistic  regression  that  uses  a  retrospective  design  where  the  genotype 
of  the  offspring  is  the  dependent  variable  (Sham  and  Curtis  1995;  Sinsheimer 
et  al.  2000;  Thomas  2004).  The  TDT  is  a  test  of  linkage  and  association  between 
a  genetic  marker  and  a  disease  locus.  When  used  with  CPTs,  the  null  hypothesis  is 
no  linkage  or  no  association  and  a  heterozygous  parent  is  equally  likely  to  pass  on 
either  of  their  alleles  to  their  offspring.  If  there  are  linkage  and  association,  one 
allele  will  appear  to  be  transmitted  more  often  to  an  affected  offspring  than  the 
other.  By  comparing  a  model  allowing  for  separate  maternal  and  paternal 
transmissions  to  the  standard  TDT  where  the  maternal  and  paternal  transmissions 
are  the  same,  the  existence  of  PoO  effects  can  be  tested. 

The  TDT  can  be  further  modified  to  examine  parent-offspring  genotype 
interactions.  For  example  non-inherited  maternal  antigen  (NIMA)  effects,  a  form 
of  MFG  incompatibility  postulated  to  occur  in  rheumatoid  arthritis  (Harney 
et  al.  2003;  Hsieh  et  al.  2006b),  can  be  tested  by  comparing  the  proportion  of 
cases  whose  genotypes  are  incompatible  with  their  mother's  genotype  to  the 
proportion  of  cases  whose  genotypes  are  incompatible  with  their  father's  genotype 
(Harney  et  al.  2003).  The  assumption  underlying  this  analysis  is  that  NIMA  is  a 
plausible  risk  factor  for  a  complex  disease,  but  non-inherited  paternal  antigens 
(NIP A)  are  not.  One  major  deficit  of  this  design  is  that  it  is  not  possible  to 
simultaneously  check  for  offspring  genotype  effects,  and  maternal  genotype  effects 
are  confounded  with  MFG  incompatibility.  The  design  also  requires  a  substantial 
number  of  fathers  be  genotyped  to  have  reasonable  power. 

The  TDT  gains  no  information  from  parents  with  homozygous  genotypes, 
limiting  power.  Weinberg  (Weinberg  et  al.  1998)  proposed  a  log-linear  model  as 
an  alternative  and  tested  for  offspring  genetic  main  effects,  maternal  genetic  main 
effects,  and  parent  of  origin  effects.  Sinsheimer  (Sinsheimer  et  al.  2003)  recognized 
the  log-linear  model  could  be  extended  to  allow  for  maternal-offspring  gene 
interaction  at  a  single  locus.  Like  the  TDT,  these  log-linear  models  use  cases  and 
their  parents  in  a  retrospective  design  in  which  the  genotypes  are  the  dependent 
variables  and  no  controls  are  necessary.  Sinsheimer' s  MFG  test  maximizes  the 
equivalent  multinomial  likelihood  to  the  log-linear  model  in  order  to  estimate 
parameters,  and  thus  easily  accommodates  maternal-offspring  and 
paternal-offspring  dyads  as  incomplete  trios. 

The  MFG  (and  equivalently  the  log-linear)  test  is  very  flexible,  allowing  many 
inherited  disease  risk  scenarios  to  be  modeled  (Ains worth  et  al.  2011;  Hsieh 
et  al.  2006a,  b,  2007;  Minassian  et  al.  2006).  When  using  CPTs  and  a  single 
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bi-allelic  locus,  there  are  15  possible  offspring-maternal-paternal  genotype 
combinations.  Under  the  null  model  of  no  genetic  effects  on  the  phenotype, 
Mendelian  transmission  holds  and  the  number  of  independent  parameters  reduces 
to  eight,  one  less  than  the  number  of  maternal-paternal  genotype  combinations 
(mating  types).  If  one  assumes  the  sex  of  the  parent  is  irrelevant  in  determining  the 
probability  of  the  mating  types  (the  symmetric  mating  assumption),  then  the  nine 
combinations  reduce  to  six.  If  random  mating  with  regards  to  the  locus  holds,  then 
the  mating  types  can  be  parameterized  in  terms  of  the  three  genotype  frequencies 
and  leads  to  two  independent  parameters  to  estimate.  The  number  of  independent 
parameters  under  the  null  further  reduces  to  one  if  Hardy  Weinberg  Equilibrium  is 
assumed. 

Maternal  and  offspring  genotype  effects  are  estimated  as  genotype  relative  risks 
along  with  mating-type  frequencies.  Table  11.1  presents  the  same  mother-offspring 
combinations  and  genotype  relative  risks  for  a  bi-allelic  locus  as  can  be  modeled 
with  CMCM  data.  Columns  3  and  4  present  two  mathematically  equivalent 
parameterizations  for  the  most  general  model  of  maternal-offspring  effects  with  a 
bi-allelic  locus.  Although  column  3,  the  joint  risk  model,  has  seven  parameters,  the 
maximum  number  of  maternal-offspring  parameters  that  can  be  estimated  is  six 
because  one  of  these  joint  risks  is  the  referent  with  value  one.  Column  4  is 
parameterized  in  terms  of  maternal  main  effects,  offspring  main  effects,  and  two 
MFG  incompatibilities.  We  also  present  two  examples  of  restrictions.  The  model  in 
column  5  represents  RHD  incompatibility  without  offspring  or  maternal  main 
effects.  Column  6  represents  NIMA  effects  along  with  offspring  genotype  effects 
(Hsieh  et  al.  2007).  All  of  these  models  are  available  for  testing  in  the  MFG  option 
of  the  Mendel  Statistical  Genetics  Software  Package  (Lange  et  al.  2013). 

The  log-linear  and  equivalent  multinomial  approaches  can  also  test  for  the 
existence  of  PoO  effects  in  the  possible  presence  of  maternal  and  offspring  effects 
(Ainsworth  et  al.  2011;  Weinberg  et  al.  1998).  These  authors  caution  against  over- 
parameterization  and  discuss  the  problem  of  multiple  interpretations. 

Although  CPTs  have  much  to  offer,  many  families  have  multiple-affected 
offspring  and  including  only  one  of  these  offspring  is  inefficient.  In  order  to  use 
any  number  of  affected  siblings  per  family,  Kraft  et  al.  (2004)  used  a  conditional 
retrospective  likelihood  approach.  This  approach  finds  the  likelihood  of  the 
genotypes  conditional  on  the  affection  status  of  the  siblings  and  can  estimate  the 
offspring,  maternal  (or  paternal)  genotype  effects,  and  their  interactions  by  includ- 
ing these  effects  in  the  penetrance  function.  Families  where  one  or  both  parents 
have  missing  genotypes  are  included  in  the  likelihood  by  summing  over  all  possible 
genotypes  for  the  missing  parents.  Unaffected  siblings  are  treated  as  phenotype 
unknown.  The  genotypes  of  these  unaffected  or  phenotype  unknown  offspring  can 
be  included  in  the  likelihood  to  help  infer  the  possible  genotypes  for  missing 
parents  without  introducing  any  bias  provided  the  disease  is  not  too  common 
(Hsieh  et  al.  2006a).  If  the  locus  under  study  is  causal,  is  unlinked  to  other  causal 
loci  and  there  are  no  gene-environment  interactions,  then  the  penetrance  functions 
of  the  offspring  are  independent  conditional  on  their  own  genotype  and  that  of 
their  mothers.  The  maximum  likelihood  estimates  of  the  relative  risks  and  the 
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Fig.  11.3  The  retrospective  likelihood  for  a  single,  arbitrary  pedigree.  With  multigenerational 
pedigrees,  the  number  of  mating  types  becomes  unpractically  large  so  founders'  genotypes  are 
used  under  an  assumption  of  random  mating.  Founder  fs  genotype  frequency  =  Prior(gy).  These 
frequencies  are  estimated  along  with  the  other  parameters  in  the  likelihood.  With  CPTs  or  two 
generation  nuclear  families,  Prior(gy)'s  are  replaced  with  mating-type  frequencies  MT(gr,g5).  Pr 
(G/lg/)  =  1  if  the  proposed  genotype  for  any  pedigree  member  /,  gz,  is  consistent  with  the  observed 
genotype  Gz,  and  0  otherwise.  When  Gz  is  missing,  Pr(G/lg,-)  —  1.  Pr(Dclgc,gr)  is  the  offspring  c's 
disease  probability  dependent  on  both  their  and  their  mother's  genotype.  For  computational  ease, 
Pr(Dclgc,gr),  is  calculated  with  Trans(gclgr,g5),  the  transmission  probability  for  offspring,  mother, 
and  father  triples  (c,r,s).  The  denominator  sums  over  all  possible  ordered  (phased)  genotypes  for 
the  n  family  members.  The  likelihoods  of  independent  pedigrees  multiply.  When  there  are  only 
CPTs,  the  denominator  is  constant  and  is  not  relevant  to  the  inference.  The  likelihood  of  the  study 
samples  is  then  proportional  to  a  15-mer  multinomial 


mating-type  frequencies  are  obtained  by  solving  score  equations  of  the  sample  log 
likelihood  and  the  standard  errors  of  the  estimates  are  derived  through  the  observed 
information  matrix  (Lange  2002).  Null  hypotheses  are  tested  using  likelihood  ratio 
test  statistics. 

Besides  allowing  more  data  to  be  used,  an  advantage  of  using  nuclear  families  is 
that  prior  exposure  effects  can  be  tested.  In  this  case,  the  genotypes  of  unaffected  or 
phenotype  unknown  siblings  fulfill  an  additional  role  of  defining  prior  exposure. 
Kraft  et  al.  (2004)  used  nuclear  families  to  test  whether  risk  of  schizophrenia 
increased  for  offspring  who  were  RHD  incompatible  when  their  older  sibling  was 
also  RHD  incompatible  and  found  support  for  this  hypothesis. 

This  conditional  retrospective  likelihood  approach  can  be  extended  for  use  with 
large  pedigrees.  Like  the  nuclear  family  test,  the  extended  MFG  incompatibility 
(EMFG)  test  examines  both  maternal  and  offspring  genotypes  as  risk  factors  for 
disease.  The  EMFG  test  jointly  models  maternal  genotype  effects,  offspring  geno- 
type effects,  and  maternal-offspring  genotype  interactions  using  a  retrospective 
likelihood  (see  Fig.  11.3  for  mathematical  details).  Childs  et  al.  (2010,  2011) 
developed  this  approach  to  allow  any  pedigree  to  be  used  including  those  with 
multiple  generations  and  multiple-affected  individuals.  To  reduce  the  number  of 
nuisance  parameters,  the  EMFG  test  replaces  mating  types  with  founder  genotypes 
and  assumes  random  mating  with  respect  genotypes  among  the  founders.  The 
EMFG  test  handles  multi-allelic  loci,  including  non-codominant  loci  and  several 
tightly  linked  loci,  and  can  also  incorporate  potential  offspring-related  confounders. 
The  EMFG  test  likelihood  uses  the  classic  formulation  of  the  pedigree  likelihood 
(Ott  1974)  and  modifies  it  by  (1)  conditioning  on  the  phenotypes  and  (2)  using 
penetrance  functions  that  depend  on  both  the  offspring  and  maternal  genotypes 
(Fig.  11.3).  Each  pedigree  has  its  own  conditional  likelihood  and  these  conditional 
likelihoods  multiply.  Unaffected  family  members  are  treated  as  phenotype 
unknown.  Although  EMFG  is  an  affected-only  analysis,  the  genotypes  of  the 
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unaffected  or  phenotype  unknown  offspring  are  used  when  there  are  missing 
parental  genotypes. 

The  conditional  likelihood  used  by  Kraft  et  al.  (2004)  and  Childs  et  al.  (2010, 
2011)  has  additional  attractive  features.  Under  relatively  mild  assumptions  regard- 
ing the  conditional  independence  of  environmental  exposure  and  offspring 
genotypes  given  parental  genotypes,  the  effects  of  environmental  covariates  can 
be  incorporated  into  the  models.  Serotypes  and  other  non-codominant  markers  can 
be  used  by  treating  the  genotypes  underling  these  phenotypes  as  missing  data 
(Minassian  et  al.  2006).  The  conditional  likelihood  approach  also  has  some 
disadvantages.  When  the  locus  under  study  is  not  the  causal  locus  but  is  linked  to 
the  causal  locus,  the  variance  in  the  parameter  estimates  are  underestimated,  which 
leads  to  false-positive  results  unless  a  robust  variance  estimator  is  used  (Kraft 
et  al.  2005). 

Although  these  single  locus  bi-allelic  analyses  provide  insights,  biological 
inference  is  limited.  For  example,  the  models  discussed  in  the  previous  paragraphs 
assume  that  there  are  joint  maternal-offspring  genotype  effects  but  no  paternal 
genotype  effects.  With  the  same  data,  we  could  have  equally  plausibly  tested  for 
joint  paternal-offspring  genotype  effects  or  main  effects  of  maternal,  paternal,  and 
offspring  genotypes.  In  fact  there  can  be  multiple  mathematically  equivalent 
parameterizations  that  have  different  biological  interpretations.  Although  null 
hypotheses  may  be  rejected,  the  statistical  analyses  cannot  provide  insights  into 
which  biological  interpretation  is  the  correct  one.  Thus  it  is  important,  when  using 
these  models  for  gene  discovery,  not  to  take  the  results  of  any  parameterization  too 
literally  and  recognize  a  number  of  alternative,  equally  plausible  explanations  may 
exist  (see  Sinsheimer  et  al.  2003  and  Ainsworth  et  al.  2011  for  details). 


11.3.3    Approaches  Using  Phenotype,  Genetic, 
and  Epigenetic  Data 

Currently  epigenetic  data  are  scarce  in  epidemiological  studies,  particularly  at  the 
genome-wide  level.  The  most  commonly  available  genome-wide  epigenetic  data 
are  DNA  methylation  profiles  (Cortessis  et  al.  2012).  Studies  collecting  these 
methylation  profiles  are  still  small  in  scale,  chiefly  because  of  the  expense.  The 
majority  of  studies  use  samples  from  unrelated  individuals.  Studies  of  relatives 
have  mainly  consisted  of  twin  studies  (Bell  and  Spector  2012;  Bocklandt 
et  al.  2011).  The  predominant  use  of  twin  studies  is  due  to  (1)  the  strong  tradition 
of  using  twins  in  heritability  studies,  which  provides  a  wealth  of  readily  available 
analysis  tools;  (2)  the  expense  of  using  full  pedigree  data;  and  (3)  changes  in  DNA 
methylation  profile  over  the  course  of  an  individual's  lifetime  making  comparisons 
of  relatives  across  generations  more  complicated  than  using  twins. 

The  data  are  often  expressed  as  the  fraction  of  a  specific  CpG  site  that  is 
methylated  (see  Laird  2010  for  a  review  of  technologies).  In  statistical  analyses, 
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this  fraction,  called  the  beta  value,  is  sometimes  treated  as  an  outcome  (the  ultimate 
phenotype  of  interest),  and  sometimes  treated  as  an  intermediate  phenotype 
associated  with  an  outcome.  When  treating  the  beta  value  as  an  outcome,  all  the 
existing  quantitative  trait  analysis  approaches,  both  for  data  from  unrelated 
individuals  or  related  individuals,  can  be  used  including  penalized  regression 
(Bocklandt  et  al.  2011).  The  heritability  of  beta  values  can  be  calculated  by  using 
pedigrees  as  well  as  by  using  twins  (Bjornsson  et  al.  2008).  One  potential  compli- 
cation with  pedigree  data  is  the  strong  age  dependence  of  the  beta  values  at  many 
DNA  methylation  sites  (Bjornsson  et  al.  2008;  Bocklandt  et  al.  2011);  however  in 
analogy  to  age  dependence  for  clinical  outcomes  age  can  be  included  as  a  covariate 
(e.g.,  Watanabe  et  al.  1999;  Kangas-Kontio  et  al.  2010). 

A  beta  value  can  also  be  an  intermediate  phenotype  (like  a  biomarker)  of  an 
outcome.  Again  there  are  statistical  genetic  methods  that  can  use  unrelated  or 
related  individuals  and  treat  beta  values  as  intermediate  phenotypes  in  association 
studies  (Cortessis  et  al.  2012).  One  question  following  from  these  association 
studies  is:  are  these  epigenetic  changes  causal  or  are  they  responses  to  the  clinical 
phenotype?  Statistical  approaches  for  inferring  causality  such  as  Mendelian  ran- 
domization (Thomas  and  Conti  2004),  genetical  genomics  (Li  et  al.  2005),  and 
structural  equation  modeling  (Morris  et  al.  2010)  provide  frameworks  for  answer- 
ing this  question.  Using  genetic  loci  associated  with  the  beta  values  can  anchor  the 
causal  direction.  These  three  statistical  approaches  are  somewhat  related  and  for 
space  considerations,  we  focus  on  Mendelian  randomization  as  it  has  been  used 
most  frequently  for  the  epigenetic  explorations. 

In  the  epigenetic  context,  Mendelian  randomization  resolves  the  question  of 
directionality  between  beta  value  and  an  outcome  by  examining  the  effect  of 
introducing  a  genetic  covariate,  a  proxy,  into  the  analysis  (Thomas  and  Conti 
2004).  The  assumption  is  that  this  proxy  is  directly  related  to  the  beta  value,  but 
it  is  only  indirectly  related  to  the  outcome.  Thus  the  magnitude  of  the  true  causal 
effect  of  methylation  at  the  CpG  site  on  the  outcome  is  the  ratio  of  the  magnitude  of 
effect  of  the  genotype  on  the  outcome  divided  by  the  magnitude  of  the  effect  of 
genotype  on  the  beta  value. 

Because  other  measured  covariates  such  as  age,  sex,  body  mass  index,  or  specific 
biomarkers  like  lipid  levels  can  be  associated  with  both  the  beta  value  and  outcome, 
it  may  be  hard  to  discern  causality.  For  example,  suppose  there  is  an  association  of 
age  with  the  clinical  phenotype,  and  there  is  an  association  of  age  with  the  beta 
value  at  a  specific  CpG  site.  Is  the  age  effect  for  the  phenotype  manifested  through 
DNA  methylation?  One  promising  approach  to  answering  this  question  is  two-step 
Mendelian  randomization  (Relton  and  Davey  Smith  2012).  In  this  context,  the 
biomarkers,  age,  etc.  constitute  exposures.  In  the  first  step,  a  genetic  proxy 
associated  with  the  exposure  is  used  to  determine  the  causality  of  the  exposure 
for  the  beta  value.  In  the  second  step,  a  different  genetic  proxy,  independent  of  the 
first  proxy  and  associated  with  the  beta  value,  is  used  to  determine  the  causality  of 
DNA  methylation  at  the  CpG  site  for  the  outcome. 
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Of  particular  relevance  to  understanding  transgenerational  effects  is  that 
Mendelian  randomization  can  be  applied  to  family  data  (e.g.,  Morris  et  al.  2009). 
Two-step  Mendelian  randomization  can  also  span  generations.  Relton  and  Davey 
Smith  (2012)  discuss  the  example  of  maternal  alcohol  use  during  pregnancy  as 
the  exposure,  offspring  methylation  fraction  at  a  particular  CpG  site  as  the  interme- 
diate phenotype  and  offspring  cognition  as  the  outcome.  In  this  case,  an  appropriate 
genetic  proxy  for  alcohol  consumption  is  the  mother's  genotype  at  an  associated 
locus  and  an  appropriate  genetic  proxy  for  the  beta  value  is  the  offspring's  genotype 
at  another  locus,  unlinked  and  independent  of  the  first  locus. 


11.4  Discussion 


Epidemiological  study  designs  and  statistical  genetic  approaches  make  it  possible 
to  detect  transgenerational  effects  in  humans.  Table  11.2  summarizes  the  study 
samples  presented  and  the  nature  of  transgenerational  effects  that  can  be  deter- 
mined using  them.  Determining  the  correct  form  of  the  transgenerational  effects 
using  the  epidemiological  studies  is  difficult  but  the  more  genetic  and  epigenetic 
information  available,  the  better  the  chances  of  differentiating  between  the 
possibilities.  Researchers  need  to  be  mindful  that  even  detailed  epigenetic  data 
are  of  limited  value  if  the  study  design  is  inadequate.  The  model  complexity  cannot 
exceed  what  is  possible  given  the  study  sample.  For  example,  none  of  these 
transgenerational  effects  can  be  tested  if  the  study  sample  is  limited  to  unrelated 
cases  and  controls. 


Table  11.2  Examples  of  study  samples  and  research  questions 

Genetic  effect 

Prenatal, 

Parent  postnatal, 


Study  sample 

Offspring 

Maternal 

Paternal 

of 

origin 

MFG 

Long-range 
transgenerational 

maternal 
inherited 

Case-control 

Yes 

No 

No 

No 

No 

No 

No 

CMCM 

Yes 

Yes 

No 

No 

No 

No 

No 

CFCF 

Yes 

No 

Yes 

No 

No 

No 

No 

CPT 

Yes 

Yes 

Yes 

Yes 

Yes 

No 

No 

Nuclear  families 

Yes 

Yes 

Yes 

Yes 

Yes 

No 

No 

Extended 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 

Yesa 

Pedigrees 

Families  using 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 

ART 

CMCM  case-mother,  control-mother  study  sample,  CFCF  case-father,  control-father  study  sam- 
ple, CPT  case-parent  trio  study  sample 

aYes,  if  adopted  offspring  and  offspring  of  sisters  are  included 


1 1    Statistical  Approaches  for  Detecting  Transgenerational  Genetic  Effects  in  Humans 


261 


Researchers  should  remember  when  analyzing  data  under  particular  hypotheses 
that  more  than  one  parameterization  with  different  biological  interpretations  are 
mathematically  equivalent  or  nearly  equivalent.  They  should  also  remember  that 
violation  of  the  underlying  (and  sometime  unstated)  modeling  assumptions  may 
lead  to  rejection  of  the  null  hypothesis  without  the  alternative  hypothesis  actually 
being  true.  For  example,  violation  of  the  symmetric  mating  assumption  will  lead  to 
false  inference  of  maternal  effects  when  analyzing  genotype  data  with  CPTs 
(Sinsheimer  et  al.  2003).  When  possible,  these  modeling  assumptions  should  be 
checked.  Independent  mechanistic  data  from  functional  studies,  in  vitro  or  using 
model  organisms,  will  be  needed  to  move  beyond  these  associations  and  resolve 
these  alternative  explanations.  Despite  these  caveats,  epidemiological  data  still 
provide  us  with  strong  evidence  in  support  of  the  existence  of  transgenerational 
genetic  effects  in  humans  and  their  roles  in  complex  disease.  Moreover  they 
generate  hypotheses  for  further  research  into  the  mechanisms  of  these 
transgenerational  effects. 
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Chapter  12 

Transmission  Ratio  Distortion:  A  Neglected 

Phenomenon  with  Many  Consequences 

in  Genetic  Analysis  and  Population  Genetics 

Aurelie  Labbe,  Lam  Opal  Huang,  and  Claire  Infante-Rivard 


Abstract  Transmission  ratio  distortion  (TRD)  is  defined  as  a  statistical  departure 
from  the  Mendelian  1:1  inheritance  ratio  and  occurs  when  one  of  the  two  alleles 
from  either  parent  is  preferentially  transmitted  to  the  offspring  (Pardo-Manuel  de 
Villena  and  Sapienza  2001).  This  phenomenon  is  conventionally  assessed  by  the 
transmission  disequilibrium  test  (TDT)  (Spielman  et  al.  1993),  which  measures  the 
departure  from  the  expected  transmission  of  an  allele  from  heterozygous  parents  to 
affected  offspring.  In  such  cases,  a  departure  from  Mendelian  ratios  suggests  the 
presence  of  linkage  and  association  between  the  allele  and  the  offspring  condition. 
The  TDT  and  other  family-based  tests  of  transmission  for  linkage  disequilibrium 
and  association  have  been  used  extensively  as  one  way  to  provide  validation  for 
case-control  results  while  controlling  for  population  structure  bias.  However,  TRD 
has  also  been  empirically  observed  in  offspring  unselected  for  disease  (Infante- 
Rivard  and  Weinberg  2005;  Naumova  et  al.  1998;  Paterson  et  al.  2003,  2009; 
Zollner  et  al.  2004),  which  suggests  the  occurrence  of  the  TRD  phenomena  in 
apparently  unaffected  populations.  Although  its  extent  in  the  human  genome  is  not 
yet  well  known,  it  has  also  been  extensively  identified  in  other  species  such  as  mice 
(LeMaire-Adkins  and  Hunt  2000;  Lyon  2003;  Wu  et  al.  2005),  drosophila  (Novitski 
1951;  Sturtevant  1936;  Zimmering  1955),  and  lesser  kestrel  (Aparicio  et  al.  2010). 
Many  of  the  reported  TRD  loci  play  a  role  in  tumor  suppression  and  have  been 
found  in  colon  cancer,  leukemia,  bladder  cancer,  intestinal  adenoma,  node-positive 
breast  cancer,  and  other  cancers  (De  Rango  et  al.  2007;  Eaves  et  al.  1999;  Naumova 
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et  al.  2001;  Paterson  et  al.  2009).  A  number  of  TRD  loci  are  within  gene  regions 
responsible  for  imprinting  (Eversley  et  al.  2010;  Naumova  et  al.  2001;  Yang 
et  al.  2008),  such  as  D12Nds2  on  chromosome  12  and  H19  on  llpl5.5,  leading 
to  loss  of  imprint  and  embryonic  lethality.  Many  TRD  loci  have  also  been  linked  to 
abnormal  development  in  neurogenesis,  neuronal  differentiation,  and  other  cogni- 
tive functions  in  the  central  and  peripheral  nervous  system  (De  Rango  et  al.  2007; 
Eversley  et  al.  2010;  Naumova  et  al.  2001;  Paterson  et  al.  2009;  Paterson  and 
Petronis  1999;  Riess  et  al.  1997). 


12.1  Introduction 


Transmission  ratio  distortion  (TRD)  is  defined  as  a  statistical  departure  from  the 
Mendelian  1 : 1  inheritance  ratio  and  occurs  when  one  of  the  two  alleles  from  either 
parent  is  preferentially  transmitted  to  the  offspring  (Pardo -Manuel  de  Villena  and 
Sapienza  2001).  This  phenomenon  is  conventionally  assessed  by  the  transmission 
disequilibrium  test  (TDT)  (Spielman  et  al.  1993),  which  measures  the  departure 
from  the  expected  transmission  of  an  allele  from  heterozygous  parents  to  affected 
offspring.  In  such  cases,  a  departure  from  Mendelian  ratios  suggests  the  presence  of 
linkage  and  association  between  the  allele  and  the  offspring  condition.  The  TDT 
and  other  family-based  tests  of  transmission  for  linkage  disequilibrium  and  associ- 
ation have  been  used  extensively  as  one  way  to  provide  validation  for  case-control 
results  while  controlling  for  population  structure  bias.  However,  TRD  has  also  been 
empirically  observed  in  offspring  unselected  for  disease  (Infante-Rivard  and 
Weinberg  2005;  Naumova  et  al.  1998;  Paterson  et  al.  2003,  2009;  Zollner 
et  al.  2004),  which  suggests  the  occurrence  of  the  TRD  phenomena  in  apparently 
unaffected  populations.  Although  its  extent  in  the  human  genome  is  not  yet  well 
known,  it  has  also  been  extensively  identified  in  other  species  such  as  mice 
(LeMaire-Adkins  and  Hunt  2000;  Lyon  2003;  Wu  et  al.  2005),  drosophila  (Novitski 
1951;  Sturtevant  1936;  Zimmering  1955),  and  lesser  kestrel  (Aparicio  et  al.  2010). 
Many  of  the  reported  TRD  loci  play  a  role  in  tumor  suppression  and  have  been 
found  in  colon  cancer,  leukemia,  bladder  cancer,  intestinal  adenoma,  node-positive 
breast  cancer,  and  other  cancers  (De  Rango  et  al.  2007;  Eaves  et  al.  1999;  Naumova 
et  al.  2001;  Paterson  et  al.  2009).  A  number  of  TRD  loci  are  within  gene  regions 
responsible  for  imprinting  (Eversley  et  al.  2010;  Naumova  et  al.  2001;  Yang 
et  al.  2008),  such  as  D12Nds2  on  chromosome  12  and  H19  on  llpl5.5,  leading 
to  loss  of  imprint  and  embryonic  lethality.  Many  TRD  loci  have  also  been  linked  to 
abnormal  development  in  neurogenesis,  neuronal  differentiation,  and  other  cogni- 
tive functions  in  the  central  and  peripheral  nervous  system  (De  Rango  et  al.  2007; 
Eversley  et  al.  2010;  Naumova  et  al.  2001;  Paterson  et  al.  2009;  Paterson  and 
Petronis  1999;  Riess  et  al.  1997). 

TRD  mechanisms  are  not  all  well  understood  yet.  They  include  germline 
selection  during  mitosis  of  germ  cells,  meiotic  drive  during  female  meiosis,  gametic 
competition  of  sperm  to  achieve  fertilization,  and  embryo  lethality  due  to 
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deleterious  genotype  or  mother-fetal  incompatibility.  Furthermore,  some  epige- 
netic  mechanisms  underlying  genomic  imprinting  have  also  been  identified  such  as 
imprint  resetting  error  or  faulty  imprint  maintenance  at  fertilization  or  in  early 
embryonic  development  stage. 

Since  TRD  involves  a  deviation  from  the  Mendelian  1:1  ratio  of  allelic  trans- 
mission from  parents  to  offspring,  it  can  only  be  measured  in  family-based  studies. 
However,  the  presence  of  TRD  in  populations  unselected  for  disease  has  a  strong 
impact  on  the  interpretation  of  results  from  family-based  linkage  and  association 
studies  designed  to  detect  departure  from  the  expected  in  sharing  or  transmission  of 
marker  alleles,  respectively  (Greenwood  and  Morgan  2000;  Paterson  et  al.  2003, 
2009;  Zollner  et  al.  2004).  When  TRD  exists  in  a  population  unselected  for  disease, 
a  linkage  or  association  signal  with  the  TRD  locus  would  be  detected  in  case 
samples  even  when  there  is  no  linkage  or  association  present.  By  inflating  or 
attenuating  the  linkage  or  association  signals,  the  presence  of  TRD  can  therefore 
lead  to  false-positive  or  false-negative  allele- sharing  or  TDT-like  test  results  and 
induce  significant  power  loss  (Greenwood  and  Morgan  2000).  These  aspects  have 
not  been  sufficiently  emphasized  in  the  literature  and  will  be  addressed  in  this 
chapter. 

The  TRD  phenomenon  also  has  implications  for  developmental  genetics.  When 
TRD  repeatedly  occurs  over  many  generations,  the  frequency  of  the  allele  favored 
in  the  selection,  and  of  the  alleles  at  nearby  loci,  begin  to  shift  upwards  in  the 
population  (Chevin  and  Hospital  2006);  as  a  consequence,  the  disadvantaged  allele 
at  the  TRD  locus  gradually  becomes  rare  in  the  population.  Although  the  impact  of 
TRD  in  the  search  for  disease-associated  rare  variants  has  not,  to  our  knowledge, 
been  investigated  in  the  literature  yet,  this  chapter  addresses  the  issue  of  the  link 
between  TRD  loci  and  rare  variants.  With  the  advent  of  high  throughput  sequencing 
of  whole-exomes  or  whole-genomes,  the  focus  is  now  on  rare  disease-causing 
variants.  However,  since  such  variants  are  likely  to  be  seen  only  once  or  twice  in 
samples  of  thousands  of  unrelated  individuals,  family  studies  are  enjoying  a 
resurgence  in  popularity.  Indeed,  from  the  basic  principle  of  inheritance,  rare 
disease-causing  variants  are  likely  to  be  seen  in  multiple  members  of  a  family 
with  a  high  prevalence  of  the  disease.  In  this  context,  studying  the  impact  of  TRD 
on  the  identification  of  rare  variants  in  family-based  studies  is  very  relevant  and 
may  provide  insights  into  the  interpretation  of  family-based  association  study 
results. 

This  chapter  is  divided  into  six  sections.  In  the  first  section,  a  review  of  study 
designs  and  statistical  methods  to  detect  TRD  is  presented,  with  a  particular 
emphasis  on  the  underlying  biological  mechanism.  In  the  second  and  third  sections, 
we  revisit  the  TRD  phenomenon,  by  considering  it  as  a  confounding  signal  in 
linkage  or  association  studies.  Section  12.4  includes  a  simulation  study  underlying 
the  importance  of  control  samples  to  detect  and  separate  TRD  signal  from  associa- 
tion or  linkage  signal  in  affected  offspring  samples.  The  fifth  section  addresses  TRD 
from  a  population  genetics  perspective  and  presents  the  results  of  a  simulation  study 
investigating  the  link  between  TRD  and  rare  variants.  Finally,  Sect.  12.6  presents  a 
case  study  on  thrombophilic  gene  variants  showing  how  TRD  can  mask  the 
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association  between  these  variants  and  the  outcome  of  small-for-gestational-age 
babies,  and  how  by  obtaining  TRD  estimates  from  control  samples  one  can  recover 
association  signals  that  would  be  left  undetected  otherwise. 

12.2    TRD  Inference:  Study  Designs  and  Methods 

TRD  is  the  result  of  disruptive  mechanisms  during  the  gametic  or  embryonic 
development  stages  (see  Huang  (2013)  for  a  review  of  such  mechanisms).  These 
TRD  mechanisms  lead  to  differential  survival  in  embryos  and  include  germline 
selection  during  mitosis  of  germ  cells  (Hastings  1991),  meiotic  drive  during  female 
meiosis  (Pardo-Manuel  de  Villena  and  Sapienza  2001),  gametic  competition  of 
sperm  to  achieve  fertilization  (Zollner  et  al.  2004),  embryo  lethality  due  to  delete- 
rious genotype  or  mother-fetal  incompatibility  (Zollner  et  al.  2004),  as  well  as 
imprint  resetting  error  or  faulty  imprint  maintenance  at  fertilization  or  in  early 
embryonic  development  stage  (Naumova  et  al.  1995,  2001;  Yang  et  al.  2008). 

If  the  search  for  TRD  loci  is  unrelated  to  a  specific  disease  but  rather  the  primary 
research  goal,  families  with  offspring  unselected  for  phenotype  or  disease  should  be 
genotyped.  Depending  on  the  underlying  biological  mechanism,  TRD  can  be 
observed  in  different  family  structures  ranging  from  two -generation  families  to 
larger  multigenerational  families.  These  different  scenarios  are  reviewed  in  detail  in 
the  following  sections. 

12.2.1    Detecting  TRD  Using  a  Transmission  Disequilibrium 
Test  (TDT)  Approach  in  Trios  with  Offspring 
Unselected  for  Phenotype 

Departure  from  the  expected  transmission  probabilities  of  an  allele  from  heterozy- 
gous parents  to  offspring  is  conventionally  measured  with  the  TDT  in  a  sample  of 
trios  (parents  and  their  offspring)  (Spielman  et  al.  1993).  Consider  for  example  a 
TRD  locus  with  2  alleles,  D  and  d,  where  the  allelic  transmission  ratio  from  parent 
to  unaffected  offspring  is  D:d  =  k:l  (i.e.,  the  D  allele  is  transmitted  k  times  more 
often  that  the  d  allele).  Assuming  two  heterozygote  parents  with  Dd  genotype,  the 
expected  proportion  of  offspring  genotypes  is  given  in  Table  12.1. 


Table  12.1  Distribution  of  offspring  genotype  proportions  for  different  values  of  TRD  ratio 


TRD  ratio 

Offspring  genotype 

DD 

Dd 

Dd 

k  =  1  (Mendelian  transmission) 

1/4  =  0.25 

1/2 

0.5 

1/4  = 

=  0.25 

k  =  1.5  (TRD  with  ratio  D:d  =  1.5:1) 

9/25  =  0.36 

12/25 

=  0.48 

4/25 

=  0.16 

k  =  2  (TRD  with  ratio  D:d  2:1) 

4/9  =  0.44 

4/9 

0.44 

1/9  = 

=  0.11 

k  =  3  (TRD  with  ratio  D:d  =  3:1) 

9/16  =  0.56 

6/16  = 

=  0.37 

1/16 

=  0.062 
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Transmission  counts  for  one  family 


Dd  DD 


D  non 
transmitted 

d  non 

transmitted 

D  transmitted 

0 

1 

d  transmitted 

0 

0 

DD 


TDTin  q  sample  of  n=90  families  with  heterozygote  parents 


D  non  transmitted 

d  non  transmitted 

Total 

□  transmitted 

a  =  0 

b  =  120 

120 

d  transmitted 

c  =  60 

d  =  0 

60 

Total 

60 

120 

2n=180 

The  TDT  tests  for  the  null  hypothesis  of  Mendelian  allelic  transmission  D : d = 1  l  1 

Null  hypothesis  //0  :  — =  —  =  0  5    y2  =  $-c)  =  2q     P-value  is  7.7<10 6 

b  +  c    b  +  c  fr+c 

Fig.  12.1  Illustration  of  the  TDT 

In  this  example,  observed  offspring  genotypes  do  not  obey  the  Mendelian  ratio 
when  k  >  1,  leading  to  a  departure  from  the  expected  distribution.  This  form  of 
TRD  can  be  explained  by  a  collection  of  biological  mechanisms  referred  to  as 
germline  selection  occurring  during  mitosis  or  by  embryo  lethality.  Germline 
selection  refers  to  mechanisms  such  as  mutation,  recombination,  and  gene  conver- 
sion, which  cause  cells  with  certain  genotypes  to  be  produced  at  a  higher  proportion 
than  others.  Hence,  germ  cells  entering  the  meiosis  stage  have  an  imbalanced 
genotype  ratio.  When  trio  samples  are  collected,  the  over-transmission  of  a  marker 
allele  from  heterozygote  mothers  or  fathers  is  tested  using  the  TDT,  which  is 
essentially  a  McNemar  statistical  test  (Liddell  1976).  This  process  is  illustrated  in 
Fig.  12.1. 

Over-transmission  of  a  marker  allele  from  parents  to  offspring  can  also  occur  in 
a  sex-of-parent- specific  manner,  which  can  be  explained  by  female  meiotic  drive  or 
by  gametic  competition.  Meiotic  drive  occurs  when  a  haplotype  with  structural 
advantage  tends  to  be  transmitted  more  during  meiosis.  Gametic  competition  (also 
called  gametic  selection)  refers  to  the  competition  of  sperms  surviving  through 
meiotic  drive  to  achieve  fertilization.  In  principle,  these  TRD  mechanisms  can  be 
uncovered  using  the  TDT  with  trios  where  over-transmission  within  strata  of 
heterozygote  mothers  or  heterozygote  fathers  is  tested  using  a  McNemar  test. 
However,  when  both  parents  are  heterozygous,  TDT  on  mothers  versus  TDT  on 
fathers  is  no  longer  a  valid  test  due  to  lack  of  statistical  independence  of 
transmissions  (Weinberg  1999).  Other  tests  have  been  proposed  in  determining 
parent-of-origin  effect,  such  as  Transmission  Asymmetry  Test  (TAT)  (Weinberg 
et  al.  1998),  Likelihood  Ratio  Test  (LRT)  (Weinberg  1999),  and  Parental 
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Fig.  12.2  TRD  caused  by 
embryo  lethality.  We  assume 
here  that  the  mutant  allele  is 
d  and  that  lethality  is 
autosomal  recessive.  As  a 
result,  dd  genotype  is 
eliminated  before  birth 


Parental 

transmission 

ratio 


Offspri  ng  ge  not  v  pe  s 
(not  sex- re  I  a  ted] 


Dd 

D:d=l:l 

O 


Dd 

D:d=l:I 


OO 


DD 

Dd 

dd 

Observed  proportion  in 

embryos  before  lethality 

1/4 

1/2 

1/4 

Observed  proportion  in 

live  borns 

1/3 

2/3 

0 

Asymmetry  Test  (PAT)  (Weinberg  1999;  Zhou  et  al.  2009).  However,  these  tests 
require  the  absence  of  prenatal  maternally  mediated  effect,  defined  as  the  effect  of 
maternal  genotype  on  outcome.  This  requirement  is  justified  by  the  fact  that 
maternal  effect  can  cause  differential  weighting  of  the  maternal  and  paternal 
transmissions.  For  the  scenario  where  diseases  are  subject  to  prenatal  maternally 
mediated  effects,  the  Parent-of-Origin  Likelihood  Ratio  Test  (PO-LRT)  method 
remains  the  only  valid  testing  procedure  (Weinberg  1999). 

TRD  can  also  be  caused  by  other  mechanisms  of  selection  not  occurring  in  the 
parents  but  after  the  embryo  is  formed.  Such  mechanisms,  termed  embryo  lethality, 
occur  when  embryos  with  a  specific  genotype  are  eliminated.  Because  this  leads  to 
an  imbalance  in  the  offspring  genotypic  ratios  as  illustrated  in  Fig.  12.2,  a  TDT 
approach  can  also  be  used. 

Embryo  lethality  can  also  be  sex-specific,  which  induces  a  sex-of-offspring- 
specific  TRD.  The  analytical  strategy  is  the  same  as  above,  except  that  TDT  is 
performed  only  in  female  (respectively  male)  offspring.  Note  that  the  issues  related 
to  maternally  mediated  effects  discussed  above  are  also  relevant  in  this  context. 

Unfortunately,  since  TDT  looks  at  over-transmission  of  a  marker  allele  where 
embryos  with  the  faulty  genotype  could  not  have  survived,  it  is  impossible  to 
determine  whether  TRD  was  caused  by  mechanisms  occurring  in  the  parents 
(meotic  drive,  gametic  competition,  or  other  mechanisms)  or  at  the  embryonic 
stage  (embryo  lethality).  Note  also  that  in  the  following  developments,  the  TDT 
approach  can  be  applied  beyond  trios  to  extended  families  unselected  for  phenotype 
(Tiwari  et  al.  2008). 


12.2.2    Detecting  TRD  in  Extended  Families  Unselected 

for  Phenotype  Using  N onparametric  Linkage  Analysis 

In  the  case  of  extended  families,  nonparametric  linkage  analysis  can  be  used  as  an 
alternative  to  the  TDT  approach.  Nonparametric  analysis  looks  at  over-sharing  of 
alleles  identical  by  descent  (IBD)  between  "affected"  related  pairs.  Two  or  more 
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alleles  are  said  to  be  IBD  if  they  are  identical  copies  of  the  same  ancestral  allele.  An 
over-sharing  of  alleles  IBD  between  related  affected  individuals  at  a  specific 
marker  indicates  linkage  between  this  marker  and  the  disease  susceptibility  locus. 
In  order  to  identify  TRD  in  families  unselected  for  disease,  all  offspring  are 
considered  as  "affected",  which  essentially  means  "having  survived".  Therefore, 
the  objective  is  to  determine  regions  in  the  genome  linked  to  the  phenotype  defined 
as  "being  alive  in  the  last  generation"  (Paterson  et  al.  2009).  This  analytical  strategy 
was  used  by  Paterson  et  al.  (2009)  in  the  Framingham  Heart  Study  cohort,  but  no 
loci  met  the  genome- wide  criteria  for  linkage.  Note  that  the  case  of  sex- specific 
TRD  can  also  be  investigated  by  performing  linkage  analysis  separately  in  males  or 
females. 

As  discussed  before,  when  studying  over-sharing  of  a  marker  allele  where 
embryos  with  the  faulty  genotype  could  not  have  survived,  it  is  impossible  to 
determine  whether  the  observed  TRD  occurred  in  the  parents  or  at  the  embryonic 
stage.  As  a  result,  the  underlying  biological  mechanisms  driving  TRD  such  as 
germline  selection,  meiotic  drive,  gametic  competition  or  embryo  lethality  cannot 
be  differentiated  and  therefore  identified  precisely. 


12.2.3    Grandparental  Origin  TRD:  Imprinting  Errors 

In  the  types  of  TRD  described  above,  deviation  in  the  allelic  transmission  from  the 
Mendelian  ratio  is  inferred  based  on  what  is  observed  in  the  offspring  genotypes. 
Another  form  of  TRD  can  occur  which  is  induced  by  an  imbalance  in  the  grandpa- 
rental  origin  of  the  offspring's  genotypes.  Under  Mendelian  inheritance  in  humans, 
each  individual  contains  the  genetic  information  transmitted  by  his/her  four 
grandparents,  with  an  expected  transmission  ratio  of  1:1:1:1.  However,  a  deviation 
from  this  ratio,  which  is  also  a  form  of  TRD,  can  be  explained  by  possible  imprint 
resetting  errors  in  the  parent's  germline  or  by  erroneous  maintenance  of  parental 
imprints  in  early  embryonic  development  stage.  Figure  12.3  illustrates  an  example 
of  a  three-generation  family  with  correct  imprint  resetting  and  maintenance.  In  this 
example,  we  assume  that  the  genetic  locus  is  maternally  imprinted,  which  means 
that  only  paternal  alleles  are  expressed  in  offspring.  As  we  see  in  Fig.  12.3,  imprint 
marks  have  been  correctly  reset  in  grandparents  A,  B,  C  and  D,  so  that  each  egg  cell 
contains  a  maternal  imprint  and  each  sperm  cell  contains  a  paternal  imprint.  As  a 
result,  both  individuals  in  the  second  generation  inherit  a  correctly  imprinted  allele 
from  their  mother  and  a  correctly  non-imprinted  allele  from  their  father.  The  same 
resetting  process  successfully  occurs  in  the  germline  of  the  second  generation 
individuals  (father  and  mother)  before  meiosis.  Then,  when  the  egg  from  the  mother 
is  fertilized  by  the  sperm  of  the  father,  each  of  them  transmits  a  correctly  imprinted 
allele  to  the  offspring.  As  seen  in  Fig.  12.3,  there  is  no  deviation  from  the 
Mendelian  ratio  in  either  the  offspring  genotypic  ratios,  nor  in  the  allelic  origin  of 
parents  and  grandparents. 
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Fig.  12.3  Example  of  a  three-generation  family  including  four  grandparents,  two  parents  and 
offspring.  We  consider  a  marker  with  two  alleles,  denoted  as  1  and  2.  Grandparents  are  denoted 
as  A,  B,  C  and  D  and  superscripts  at  each  genotype  indicate  the  grandparent  origin.  In  this 
example,  correct  imprint  resetting  occurs  in  the  germline  before  the  production  of  eggs  and 
sperm  cells.  We  assume  here  that  the  marker  is  maternally  imprinted  and  imprinted  marks  are 
represented  by  a  red  triangle 


Figure  12.4  illustrates  the  scenario  where  an  imprint  resetting  error  occurred  on 
allele  2  of  the  mother,  which  is  incompatible  with  embryonic  survival.  This  leads  to 
the  deviation  from  Mendelian  inheritance  ratio  in  the  allelic  origin  of  the 
grandparents.  Interestingly,  this  also  leads  to  a  deviation  from  the  Mendelian 
ratio  in  the  offspring,  which  seems  to  suggest  that  this  phenomenon  could  be 
captured  by  using  the  TDT  approach  in  trios  described  above. 

For  comparison,  Fig.  12.5  illustrates  a  similar  scenario,  but  the  imprint  resetting 
error  occurred  on  allele  1  of  the  father.  Similarly,  the  allele  which  failed  to  reset 
correctly  is  under-transmitted.  A  deviation  from  Mendelian  ratio  of  the  alleles  from 
grandparents  can  be  observed  in  the  offspring.  This  observation  is  the  basis  of  the 
statistical  analyses  aiming  to  uncover  TRD  induced  by  imprinting  errors. 

Two  analytical  strategies  have  been  proposed  in  the  literature  to  determine  the 
grandparental  origin  of  TRD.  First,  a  simple  binomial  test  can  be  used  by  determin- 
ing if  the  proportions  of  grandpaternal  alleles  and  grandmaternal  alleles  are  equal  in 
the  offspring's  genotypes  for  a  given  marker.  In  practice,  TRD  is  estimated  by  the 
proportion  of  grandmaternal  alleles  transmitted  to  the  offspring  (Naumova 
et  al.  2001;  Yang  et  al.  2008).  The  method  of  maximum  likelihood  (Lange  1997) 
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Fig.  12.4  Example  of  a  three-generation  family  with  imprint  resetting  error  at  allele  2  in  mother. 
Same  scenario  as  in  Fig.  12.3,  an  imprint  resetting  error  occurred  in  the  mother,  which  is 
incompatible  for  embryonic  survival 


can  be  used  to  estimate  TRD  in  the  presence  of  missing  genotypes,  by  using 
neighboring  flanking  markers  as  well  as  map  distances  (Croteau  et  al.  2002).  In 
cases  where  embryo  lethality  due  to  imprinting  error  occurs  in  a  sex-of-offspring 
specific  manner,  TRD  can  also  be  estimated  by  using  a  logistic  regression  model 
predicting  grandparental  source  (dichotomous  outcome),  where  variables  such  as 
sex  of  offspring  and  mating  type  of  parents  are  included  in  the  model  (Yang 
et  al.  2008).  In  Yang  et  al.'s  paper  (2008),  grandparental  origin  TRD  locus  was 
inferred  on  the  basis  of  genotypes  of  the  closest  microsatellite  markers.  For 
non-informative  markers,  it  was  inferred  on  the  basis  of  the  grandparental  origin 
of  the  flanking  markers. 


12.3    Impact  of  TRD  in  Association  or  Linkage  Analysis 

When  TRD  occurs  at  a  disease  locus  or  at  a  locus  in  linkage  disequilibrium 
(LD)  with  the  disease  locus,  a  linkage  or  association  signal  would  be  inflated  or 
attenuated,  potentially  leading  to  a  false  positive  or  false  negative  result.  On  the 
other  hand,  if  TRD  occurs  at  a  locus  distant  from  the  disease  susceptibility  locus 
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Fig.  12.5  Example  of  a  three-generation  family  with  imprint  resetting  error  at  allele  1  in  father. 
Same  example  as  in  Fig.  12.3,  an  imprint  resetting  error  occurred  in  the  father,  which  is 
incompatible  for  embryonic  survival 


(DSL)  and  is  not  in  LD  with  the  DSL,  a  linkage  signal  at  the  TRD  locus  would  be 
detected  in  families  selected  for  disease,  leading  to  a  false  positive  signal.  This 
phenomenon  has  been  investigated  in  several  studies  (Greenwood  and  Morgan 
2000;  Paterson  et  al.  2003,  2009;  Zollner  et  al.  2004)  quantifying  the  impact  of 
TRD  on  linkage  results.  Greenwood  and  Morgan  (2000)  studied  the  case  of  affected 
sib  pairs  and  showed  that  IBD  patterns  between  two  affected  sibs  are  strongly 
modified,  leading  to  important  bias  in  the  significance  of  a  sib-pair  linkage  analysis. 
They  also  suggested  to  increase  the  sample  size  by  a  small  amount  to  maintain  the 
desired  power  if  TRD  with  modest  deviation  from  Mendelian  segregation  is 
suspected  at  the  planning  stage  of  a  study. 

Several  approaches  have  been  suggested  to  overcome  the  bias  induced  by  TRD 
in  family-based  linkage  or  association  analysis.  When  a  TRD  locus  unrelated  to  a 
specific  disease  is  detected  in  families  unselected  for  phenotype,  it  is  not  surprising 
that  most  approaches  proposing  to  account  for  TRD  in  the  statistical  analysis  use  a 
combination  of  both  non-affected  and  affected  subjects.  For  instance,  Spielman 
et  al.  (1993)  proposed  to  use  a  mixture  of  case  trios  (affected  offspring  with  parents) 
and  control  trios  (unaffected  offspring  with  parents)  to  differentiate  true  linkage  or 
association  signals  from  false  positives  due  to  TRD  by  applying  a  TDT  to  both 
types  of  trios.  The  study  concluded  that  a  statistically  significant  TDT  in  case  trios 
but  not  in  control  trios  suggests  evidence  of  true  linkage  and  association. 
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Furthermore,  a  statistically  significant  TDT  observed  in  both  case  trios  and  control 
trios,  with  a  significant  difference  in  transmission  counts  between  case  trios  and 
control  trios  also  suggests  evidence  of  true  linkage  and  association.  Note  that  an 
extension  of  TDT  using  discordant  sib -pair-parent  tetrads  has  been  proposed  by 
Deng  et  al.  (2009)  and  can  be  used  to  assess  significance  of  the  TDT  results.  Finally, 
we  propose  another  strategy,  which  consists  in  estimating  TRD  at  a  marker  locus  in 
a  sample  unselected  for  disease  and  modifying  the  null  hypothesis  of  association 
accordingly  in  the  sample  selected  for  disease.  For  example,  if  the  probability  to 
transmit  the  minor  allele  from  parents  to  offspring  is  estimated  to  be  0.6  in  the 
control  sample,  one  can  test  the  null  hypothesis  that  this  probability  is  0.6  in  the 
case  sample.  A  deviation  from  this  value  should  indicate  a  true  signal.  This  strategy, 
similar  in  essence  to  the  one  proposed  by  Spielman  (1993)  will  be  detailed  and 
illustrated  in  the  case  study  section. 

In  the  context  of  affected  sib-pair  designs,  Lemire  et  al.  (2004)  developed  a 
novel  allele- sharing  statistic  that  accounts  for  the  possible  bias  induced  by  TRD. 
This  statistics  evaluates  the  excess  sharing  of  alleles  in  pairs  of  affected  siblings,  as 
well  as  a  deficit  of  sharing  in  phenotypically  discordant  relative  pairs,  where 
available.  If  unaffected  siblings  are  available,  this  statistic  is  unbiased  in  TRD 
regions.  If  more  distantly  related  unaffected  subjects  are  available,  the  bias  is 
reduced  but  not  completely  eliminated.  Note  that  the  Haseman-Elston  regression 
model  for  affected  sib  pairs  is  not  affected  by  TRD  and  therefore  represents  another 
interesting  alternative  for  the  analysis. 

Casellas  (2012)  developed  a  Bayesian  binomial  model  accounting  for  deviation 
TRD  mechanisms  in  F2  mouse  crosses.  This  model  was  used  to  perform  genome- 
wide  scans  for  TRD  quantitative  trait  loci  (QTL)  on  six  such  crosses.  Results 
suggest  a  relevant  incidence  of  TRD  phenomena  in  mouse  with  important 
implications  for  both  statistical  analyses  and  biological  research,  such  as  those 
underscored  in  this  chapter. 


12.4    The  Use  of  Controls  to  Differentiate  TRD  from  Real 
Signal:  A  Simulation  Study 

In  addressing  the  phenomenon  of  TRD,  which  acts  as  a  confounder  for  linkage  and 
association  signals,  Spielman  et  al.  (1993)  first  suggested  the  use  of  both  case  and 
control  trios.  The  proposed  method  is  to  apply  a  TDT  separately  on  case  trios  and 
control  trios.  The  method  is  illustrated  in  Tables  12.2  and  12.3,  with  marker  minor 
allele  denoted  as  M  and  major  allele  as  m. 
Spielman' s  study  (1993)  concluded  that: 

1 .  A  statistically  significant  TDT  in  case  trios  suggests  evidence  of  either  linkage/ 
association,  or  TRD,  or  both. 

2.  A  statistically  significant  TDT  in  control  trios  suggests  evidence  of  TRD,  or  both 
TRD  and  linkage/association. 
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Table  12.2  TDT  on  case  trios  and  control  trios 


Case  trios 

Control  trios 

Non-transmitted  allele 

Non-transmitted  allele 

Transmitted  allele 

M  m 

M 

m 

M 

ax  b\ 

a2 

b2 

M 

d2 

TDT  test  statistics 

2  Ol-Cl)2 

2        (p2  ~  C2) 

/D  ~  (b2  +  c2) 

Table  12.3  Pearson's  Chi-square  test  on  case  trios  and  control  trios 


Transmitted  allele  in 
heterozygous  parents 


M 

m 

Row  total 

Case  trios 

bi 

C\ 

nx 

Control  trios 
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n2 

Column  total 

nh 

nc 
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Pearson's  Chi-square  test  statistic 
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_  n(b\C2  -  c\b2) 
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Table  12.4  Simulation  results  for  four  scenarios  each  averaged  over  500  simulations  based  on 
TDT  and  Pearson's  Chi-square  test 


Significance  of  Pearson's 

Presence  of 

Significance 

Significance 

Chi-square  test  of  case  trios 

linkage  and 

Presence 

of  TDT  in 

of  TDT  in 

vs.  control  trios  transmission 

Scenarios 

association 

of  TRD 

case  trios 

control  trios 

counts 

1 

No 

No 

No 

No 

No 

2 

No 

Yes 

Yes 

Yes 

No 

3 

Yes 

No 

Yes 

No 

Yes 

4 

Yes 

Yes 

Yes 

Yes 

Yes 

3.  A  statistically  significant  TDT  in  case  trios  but  not  in  control  trios  suggests 
evidence  of  true  linkage  and  association. 

4.  When  a  statistically  significant  TDT  is  observed  in  both  case  trios  and  control 
trios,  a  nonsignificant  Pearson's  Chi-square  statistic  of  case  trios  versus  control 
trios  on  transmission  counts  suggests  evidence  of  TRD  only. 

5.  When  a  statistically  significant  TDT  is  observed  in  both  case  trios  and  control 
trios,  a  significant  Pearson  Chi-square  statistic  of  case  trios  versus  control  trios 
on  transmission  counts  suggests  evidence  of  true  linkage/association  and  TRD. 

To  verify  Spielman  et  al.'s  (1993)  findings,  we  set  up  a  simulation  study  for  the 
four  following  scenarios  described  in  Table  12.4. 
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The  disease  allele  frequency  (p)  in  the  population  was  set  between  0.01  and  0.05 
indicating  a  rare  to  moderately  rare  disease  frequency.  The  marker  minor  allele 
frequency  (q)  was  set  at  0. 1 .  The  underlying  TRD  influence  on  the  marker  locus  had 
a  ratio  between  0.6  and  0.9  for  the  minor  allele,  exploring  mild  to  extreme  skew  of 
transmission;  here  we  define  the  TRD  ratio  to  be  the  proportion  of  the  preferred 
allele  transmission  counts  among  all  transmission  counts  from  parents  to  offspring 
at  a  specific  locus.  For  example,  if  it  is  three  times  more  likely  to  transmit  the 
advantaged  over  the  disadvantaged  allele,  the  TRD  ratio  is  3/(3  +  1)  =  0.75.  The 
recombination  fraction  between  disease  and  marker  loci  (6)  was  specified  as  0. 1  in 
the  scenarios  3  and  4  when  there  was  linkage  and  association  between  disease  and 
marker  loci,  or  otherwise  is  set  to  0.5  (scenarios  1  and  2).  A  prespecified  linkage 
disequilibrium  (LD)  parameter  (5)  was  adjusted  for  each  disease  allele  frequency 
being  tested,  to  ensure  positive  haplotype  frequencies,  which  depend  on  disease  and 
marker  allele  frequencies.  Therefore,  LD  was  set  to  be  slightly  less  than  the 
minimum  of  p(l  —  q)  and  q(l  —  p)  when  there  was  linkage  and  association 
(scenario  3  and  4),  and  set  to  0  otherwise  (scenario  1  and  2). 

Based  on  the  haplotype  frequencies,  which  depend  on  the  marker  and  disease 
allele  frequencies,  and  the  LD  parameter,  we  generated  a  population  of  600,000 
trios  (parents  and  child).  We  then  simulated  random  mating  in  this  population. 
Recombination  and  transmission  of  alleles  occurred  with  the  probabilities  stated 
above  for  recombination  fraction  and  TRD  ratio.  Assuming  a  recessive  mode  of 
inheritance  at  the  disease  loci,  we  randomly  sampled  500  case  trios  and  500  control 
trios  from  the  simulated  population.  We  then  applied  the  TDT  at  the  marker,  for 
both  the  case  and  control  trios.  As  suggested  by  Spielman  et  al.  (1993),  we  further 
applied  the  Pearson's  x  test  to  assess  the  excess  or  deficit  in  transmission  of  minor 
allele  over  major  allele  in  case  trios  versus  control  trios.  This  procedure  was 
repeated  500  times,  and  the  results  of  the  test  statistics  were  averaged  over  these 
500  simulations.  Both  the  McNemar  test  and  Pearson's  %  test  are  1  degree  of 
freedom  tests,  the  p- values  are  computed  accordingly  using  each  of  the  four 
averaged  test  statistics  over  500  simulations.  Our  results  support  the  proposals  of 
study  design,  statistical  method,  and  conclusions  suggested  by  Spielman 
et  al.  (1993),  as  shown  in  Table  12.4.  This  simulation  study  was  repeated  for  a 
dominant  mode  of  inheritance,  and  the  same  results  were  obtained. 


12.5    TRD  and  Rare  Variants:  A  Simulation  Study 

The  impact  of  TRD  at  the  organismal  level  could  become  manifest  at  the  population 
level  as  the  human  genome  evolves  over  time.  Therefore,  TRD  is  should  also  be 
studied  in  a  population  genetics  context  because  such  selection  leads  to  changes  in 
the  diversity  of  the  population  gene  pool  over  generations.  Changes  in  genetic 
diversity  over  time  culminate  in  the  current  population  to  an  equilibrium  state  of 
parameters  such  as  the  minor  allele  and  haplotype  frequencies  at  TRD  and  neigh- 
boring loci,  and  linkage  disequilibrium  between  loci.  If  the  TRD  selection  is 
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persistent  through  many  generations,  a  gradual  shift  in  the  allele  frequency  at  the 
TRD  locus  would  be  observed.  Over  time,  the  positively  selected  allele(s)  could 
become  fixed  in  the  population  while  the  alternatives  are  completely  eliminated. 
This  may  provide  an  explanation  as  to  why  studies  have  been  able  to  discover  only  a 
small  number  of  TRD  loci,  because  alleles  at  some  of  these  TRD  loci  may  have 
already  become  monomorphic.  Therefore,  no  genetic  variation  could  be  detected  in 
the  population  on  these  "disappeared"  TRD  loci.  However,  through  study  of 
identified  known  TRD  loci,  some  negatively  selected  alleles  still  exist  at  a  low 
frequency  and  remain  polymorphic  as  rare  variants.  This  raises  questions  as  to  why 
such  a  selection  process  did  not  sweep  the  positively  selected  allele  into  fixation. 
Several  authors  have  tried  to  answer  this  question  by  suggesting  theories  on  sources 
of  counter-balancing  forces  which  keep  the  allele  in  polymorphic  state,  such  as 
recombination  (Haig  and  Grafen  1991),  mutation  and  genetic  drift  (Polaski  1998), 
and  an  immunogenetic  advantage  for  survival  in  later  adulthood  regardless  of  low 
fertility  (Westendorp  et  al.  2001). 

The  existence  of  these  rare  variants  provides  us  with  great  insight  into  the 
understanding  of  TRD  selection  and  the  importance  of  corresponding  gene 
functions  at  these  loci.  Rare  disease  variants  are  currently  the  focus  of  genome- 
wide  association  studies  in  search  of  missing  heritability  in  complex  disorders 
(Maher  2008).  It  has  been  hypothesized  that  rare  disease  variants  could  be  more 
functional  than  common  variants  and  have  high  penetrance  (Bodmer  and  Bonilla 
2008;  Gorlov  et  al.  2008;  Kryukov  et  al.  2007).  This  suggests  a  potentially  similar 
role  for  negatively  selected  TRD  rare  variants  when  their  gene  functions  determine 
survival.  Since  there  is  usually  low  power  to  detect  rare  variants  using  a  standard 
genome-wide  genotyping  platform  with  feasible  sample  size,  there  are  intense 
ongoing  research  efforts  to  address  this  issue  (Cirulli  and  Goldstein  2010;  Li  and 
Leal  2009).  These  efforts  should  lead  to  a  better  understanding  of  TRD  and  its 
contribution  to  the  rare  variant  phenomenon  itself. 

By  using  the  formulae  in  Chevin  and  Hospital  (Chevin  and  Hospital  2006),  we  set 
up  a  simulation  study  to  trace  the  marker  allele  frequency  at  a  TRD  locus  over 
generations.  The  marker  allele  frequency  (q)  at  generation  0  is  set  at  0. 1  for  the  minor 
allele.  Disease  allele  frequency  (p)  at  a  neighboring  locus  is  set  at  0.01,  which 
corresponds  to  a  rare  disease  allele.  To  simulate  presence  of  linkage  and  association, 
the  recombination  fraction  (6)  is  specified  at  0.1,  and  LD  parameter  (8)  at  0.0089, 
which  is  slightly  less  than  the  minimum  of  q(l  —  p)  and p(l  —  q)  to  ensure  positive 
haplotype  frequencies.  Let  r  be  the  TRD  ratio  and  qt  be  the  marker  allele  frequency 
at  the  /th  generation.  Here  we  use  a  different  notation  than  Chevin  and  Hospital 
(2006)  to  be  consistent  with  our  definitions  specified  above.  The  change  in  marker 
allele  frequency  in  ith  generation  (qt),  as  shown  in  equation  (1)  of  Chevin  and 
Hospital  (2006),  is  (2r  —  l)^/_i(l  —  q^i).  With  rearrangement,  the  LD  in  ith  gener- 
ation is  approximated  by  5t ■  =  (1  —  26)qi(\  —  qt)  <5m<7m(1  —  (seen  in  a  dif- 
ferent form  in  equation  (9)  of  Chevin  and  Hospital  (2006)),  which  decays  over  time. 
For  an  exact  computation  of  LD,  the  more  complex  formula  in  equation  (8)  of 
Chevin  and  Hospital  (2006)  also  includes  the  TRD  ratio  on  top  of  the  previously 
mentioned  parameters. 
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Fig.  12.6  Illustration  of  genetic  diversity  over  generations  when  TRD  ratio  persistently  occurs  at 
0.9,  measured  by  haplotype  frequencies  of  TRD  and  neighboring  loci  with  minor  alleles  M  and  D 
respectively  (left),  and  LD  parameter  (right).  At  the  equilibrium  stage,  the  frequency  of  allele  M  at 
TRD  locus  is  the  sum  of  haplotype  frequencies  of  dM  and  DM  which  is  0.935  +  0.065  =  1,  where 
it  has  reached  fixation 


In  Fig.  12.6  (left  panel),  we  have  illustrated  the  change  in  diversity  in  terms  of 
haplotype  frequencies  of  the  linked  marker  and  neighboring  disease  loci  with  minor 
alleles  M  and  D  respectively,  and  the  LD  measure,  with  a  TRD  ratio  of  0.9. 
Furthermore,  different  combinations  of  TRD  ratio,  recombination  fraction  and 
LD  have  been  experimented  to  illustrate  the  range  of  corresponding  number  of 
generations  for  each  combination  to  reach  fixation,  as  shown  in  Fig.  12.6  (right 
panel).  The  figure  shows  that  as  recombination  fraction  decreases  or  LD  parameter 
increases,  it  takes  longer  for  the  allele  at  the  TRD  locus  to  reach  fixation,  because 
the  presence  of  linkage  and  association  slows  down  the  selective  sweep.  On  the 
other  hand,  when  the  TRD  ratio  decreases,  it  also  takes  longer  for  the  allele  at  the 
TRD  locus  to  reach  fixation  because  of  the  milder  distortion  force. 

Through  our  simulation,  we  found  that  for  a  TRD  ratio  of  0.9,  fixation  can  be 
reached  in  ten  generations.  As  for  a  TRD  ratio  of  0.6,  it  can  take  up  to  80  generations 
to  reach  fixation,  depending  on  the  strength  of  linkage  and  association  between 
marker  and  disease  loci.  These  changes  in  genetic  diversity  over  time  culminate 
eventually  in  an  equilibrium  state  for  the  involved  parameters  in  the  population, 
namely  the  minor  allele  and  haplotype  frequency  at  TRD  and  neighboring  loci,  and 
LD  between  marker  and  disease  loci  (Huang  et  al.  2011). 


12.6    TRD  and  Its  Impact  on  Association  Analysis: 
A  Case  Study 


As  mentioned  above,  a  strategy  to  account  for  TRD  in  association  analyses  consists 
in  estimating  TRD  at  a  marker  locus  in  a  sample  unselected  for  disease  and 
modifying  the  null  hypothesis  of  association  accordingly  in  the  sample  selected 
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Table  12.5  Transmission 
counts  from  a  set  of  trios 


Not  transmit  Ml 


Not  transmit  M2 


Transmit  Ml 
Transmit  M2 


a 


c 


b 

d 


for  disease.  For  instance,  consider  a  marker  with  two  alleles,  Ml  and  M2  with 
frequencies  q  and  (1  —  q)  respectively.  Consider  also  a  disease  locus  with  two 
alleles  and  minor  allele  frequency  of  p.  Let  D  and  6  be  the  linkage  disequilibrium 
and  recombination  fraction,  respectively  between  the  disease  and  marker  locus. 
Suppose  also  that  TRD  occurs  at  the  marker  locus  such  that  the  probability  to 
transmit  Ml  from  a  parent  to  the  offspring  is  a.  Under  these  assumptions,  one  can 
write  the  probability  that  a  parent  with  genotype  M1M2  transmits  the  allele  M2  and 
does  not  transmit  Ml: 


Similarly,  one  can  compute  ^(Transmit  M2,  not  transmit  Ml).  Under  this 
setting,  testing  the  null  hypothesis  H0:  ^(Transmit  Ml,  not  transmit  M2)  = 
jz^P (Transmit  M2,  not  transmit  Ml)  is  a  valid  test  for  association  since  this 
null  hypothesis  is  equivalent  to  6  =  1/2  and  D  =  0.  If  the  TRD  value  a  is 
known  from  the  literature  or  from  an  external  control  sample,  one  can  test 
for  association  with  the  disease  locus  in  a  sample  of  case  trios.  Assuming 
transmission  counts  as  in  Table  12.5,  the  chi-square  test  statistic  is  equal  to 

2  _  [{l-a)b-ac]2 
X         a{\-a){b+c)  ' 

Note  that  this  statistics  reduces  to  the  conventional  TDT  statistics  x2  —  jb^c) 

when  a  =  0.5,  which  means  that  no  TRD  occurs.  The  advantage  of  this  approach 
over  Spielman's  approach,  which  compares  transmission  in  control  and  case 
samples,  is  that  the  TRD  value  a  can  be  taken  from  the  literature  or  from  an 
external  source,  if  such  information  is  available. 

This  approach  was  applied  to  a  study  on  the  association  between  thrombophilic 
gene  variants  and  the  occurrence  of  births  qualified  as  small-for-gestational-age 
(defined  below).  Thrombophilia  is  the  name  given  to  a  condition  characterized  by  a 
tendency  to  develop  thrombosis,  mostly  as  a  consequence  of  inherited 
polymorphisms.  The  most  commonly  studied  thrombophilic  polymorphisms  are 
the  C677T  and  A1298C  variants  in  the  methylene  tetrahydrofolate  reductase 
(MTHFR)  gene;  the  G 1691 A  variant  in  Factor  V  Leiden;  and  the  G20210A  variant 
in  the  Prothrombin  (or  Factor  II)  gene.  Others,  less  studied  are  the  plasminogen 
activator  inhibitor- 1  (PAI-1)  gene  and  the  Factor  XIII  (F  XIII)  gene.  Small-for- 
gestational-age  (SGA)  birth  is  defined  as  birth  weight  below  the  10th  percentile  for 
gestational  age  and  sex,  according  to  national  standards.  The  underlying  hypothesis 
in  our  work  was  that  a  thrombophilic  predisposition  will  increase  the  risk  of 
vascular  thrombosis,  which  in  turn  can  lead  to  placental  insufficiency  and  small 
infants. 


^(Transmit  Ml,  not  transmit  M2)  =  2a{ )  2a [q  +  ^  (1  -  q)  -  2a0^ . 
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Table  12.6  Results  from  the  TDT  tests.  P-values  (P)  and  transmission  ratios  (a)  are  presented 
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-  -  0  44 

MTHFR-C677T 

P  =  0.28 

P  =  0.64 

P  =  0.11 

a  =  0.46 

a  =  0.45 

FV 

P  =  0.005 

P  =  0.00033 

P  =  0.44 

a  =  0.27 

a  =  0.57 

FII 

P  =  0.0002 

P  =  0.07 

P  =  0.002 

a  =  0.09 

a  =  0.2 

The  study  was  described  previously  (Infante-Rivard  et  al.  2002,  2003,  2005; 
Infante-Rivard  2010).  It  was  initiated  as  a  case-control  comparison  of  mother  and 
newborn  dyads  with  and  without  SGA  birth,  and  we  extended  the  data  to  form  case- 
parent  trios  by  including  genetic  material  obtained  from  fathers.  All  SGA  infants 
seen  at  our  university  center  in  Montreal,  Quebec,  Canada,  between  May  1998  and 
June  2000  were  eligible  if  they  were  singletons  who  were  born  alive  after  the  24th 
week  of  gestation  without  severe  congenital  anomalies.  During  that  period, 
505  case  mothers  were  seen  and  493  (98  %)  participated  in  the  study.  The  same 
criteria  applied  to  the  selection  of  control  mothers  (defined  as  women  whose  babies' 
birth  weights  were  at  or  above  the  10th  percentile  for  gestational  age  and  sex). 
Controls  were  matched  to  cases  on  gestational  week,  sex,  and  race/ethnicity  and 
were  selected  for  having  a  birth  date  closest  in  time  to  the  case's.  The  mothers  of 
480  controls  were  invited  to  participate,  and  472  (98  %)  accepted.  Medical  records 
were  used  to  determine  gestational  age  on  the  basis  of  the  obstetrician's  assessment 
from  ultrasound  and  other  clinical  data. 

Blood  was  obtained  from  448  case  newborns  (91  %  of  participants  and  89  %  of 
eligible  persons)  and  431  control  newborns  (91  %  of  participants  and  90  %  of 
eligible  persons).  Maternal  and  umbilical  cord  blood  samples  were  collected. 
Approximately  midway  through  the  study,  we  started  prospectively  collecting 
DNA  samples  from  fathers.  We  obtained  genetic  material  on  260  case  fathers  and 
248  control  fathers,  representing  a  response  rate  of  approximately  86  %  (of  those 
invited  to  participate).  Genotyping  is  extensively  described  in  previous  papers 
(Infante-Rivard  2010). 

First,  a  TDT  was  performed  in  the  control  sample  in  order  to  estimate  TRD  at  the 
6  SNPs.  Then,  a  modified  TDT  as  described  above  was  performed  in  the  case 
sample  by  adjusting  the  null  hypothesis  using  the  TRD  values  obtained  in  the 
control  sample.  A  classic  TDT  in  cases  was  also  performed  for  comparison 
purposes.  Results  are  presented  in  Table  12.6. 
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As  one  can  see,  a  TDT  in  the  control  sample  leads  to  the  detection  of  TRD  at  loci 
MTHFR-A1298C,  FV  and  FII,  with  the  probability  a  to  transmit  the  minor  allele 
from  heterozygote  parent  to  offspring  being  0.39,  0.27  and  0.09  respectively  (recall 
that  a  probability  of  0.5  represents  a  Mendelian  transmission).  When  testing  at  each 
locus  the  null  hypothesis  that  an  a — deviation  is  also  observed  in  the  case  sample, 
we  obtained  very  interesting  results.  First,  we  observe  that  the  null  hypothesis  is 
strongly  rejected  at  the  Factor  V  locus,  which  suggests  that  an  extra-deviation  from 
the  TRD  transmission  is  present  in  the  case  sample.  By  looking  at  the  transmission 
counts,  we  also  observed  that  the  probability  to  transmit  the  minor  allele  for 
heterozygous  parent  to  offspring  is  0.57  in  the  case  trios,  which  does  not  suggest 
the  presence  of  association  or  linkage  if  a  standard  TDT  test  is  applied  (p-value 
0.44).  In  fact,  there  are  two  TRDs  acting  in  opposite  directions  at  this  locus.  In  the 
control  population,  a  TRD  is  present  which  favors  the  transmission  of  the  major 
allele.  However,  in  the  case  population,  the  minor  allele  is  over- transmitted,  which 
suggests  an  association  with  the  disease  locus.  Note  that  this  could  not  be  observed 
by  performing  a  standard  TDT  on  the  case  trios  since  the  two  acting  TRDs  would 
tend  to  cancel  each  other  and  lead  to  an  artificially  observed  Mendelian  transmis- 
sion in  cases. 

Most  studies  on  SGA  have  used  a  case-control  design.  Many  of  the  published 
studies  were  affected  by  likely  selection  bias  due  to  the  nature  and  the  way  in  which 
controls  were  selected,  leading  to  spurious  results.  Nevertheless,  despite  the  fact 
that  not  all  studies  reported  an  increased  risk  for  FV  (Dudding  et  al.  2004),  many 
studies  have  done  so  (Dudding  and  Attia  2004).  The  discrepancy  between  results 
from  case  trios  alone  (often  assumed  to  be  a  more  robust  design)  and  case-control 
studies  may  be  reconciled  with  the  approach  we  propose  here. 

Another  interesting  locus  to  study  is  the  Factor  II  locus.  A  TRD  is  observed  in 
the  control  population  as  well  as  in  the  case  population,  leading  to  a  significant  TDT 
test  in  the  case  trios.  However,  after  adjusting  for  this  observed  deviation  in  the 
control  sample,  a  TDT  test  in  the  case  population  is  no  longer  significant,  which 
suggest  that  the  association  observed  in  the  case  trios  using  a  standard  TDT  was  a 
false  positive. 

In  conclusion,  although  not  well  known  or  appreciated,  TRD  is  a  population 
genetic  phenomenon  that  can  provide  insight  into  the  evolution  of  the  genome  as  we 
find  it  today;  on  a  more  practical  level,  and  as  we  have  argued,  considering  the 
possibility  of  TRD  in  linkage  and  association  studies  is  important  for  the  validity  of 
conclusions. 
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Epigenome-Wide  Association  Studies: 
Potential  Insights  into  Human  Disease 
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Abstract  The  burden  on  human  health  due  to  common  diseases,  such  as  the 
metabolic  syndrome,  cardiovascular  and  inflammatory  disorders  is  extreme.  With 
increasingly  longer  lived  populations,  the  morbidity  of  these  chronic  conditions 
leads  to  vast  physical,  psychological  and  economic  cost.  In  order  to  further  under- 
stand the  pathogenicity  of  these  complex  diseases,  the  intertwined  influence  of 
environmental  factors  and  polygenic  susceptibility  needs  to  be  unravelled.  This 
may  at  first  seem  an  insurmountably  difficult  task  but  progress  has  been  made  in 
recent  years. 
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GWAS 

Genome-wide  association  study 

HSM 

Haplotype- specific  methylation 

LD 

Linkage  disequilibrium 
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Low  methylation  region 

MeDIP-seq 

Methylation- dependent  immunoprecipitation 

second-generation  sequencing 

MDR 

Methylation  determining  region 

MVP 

Methylation  variable  position 

RRBS-seq 

Reduced  representation  bisulphite  second-generation  sequencing 

TFBS 

Transcription  factor  binding  site 

WGAS 

Whole  genome  sequencing  association  study 

13.1  Introduction 

The  burden  on  human  health  due  to  common  diseases,  such  as  the  metabolic 
syndrome,  cardiovascular  and  inflammatory  disorders  is  extreme.  With  increas- 
ingly longer  lived  populations,  the  morbidity  of  these  chronic  conditions  leads  to 
vast  physical,  psychological  and  economic  cost  (Clarke  et  al.  2010).  In  order  to 
further  understand  the  pathogenicity  of  these  complex  diseases,  the  intertwined 
influence  of  environmental  factors  and  polygenic  susceptibility  needs  to  be 
unravelled.  This  may  at  first  seem  an  insurmountably  difficult  task  but  progress 
has  been  made  in  recent  years. 

The  advent  of  Genome- Wide  Association  Studies  (GWAS)  (Sladek  et  al.  2007; 
The  Wellcome  Trust  Case  Control  Consortium  2007),  built  upon  population-based 
linkage  disequilibrium  (LD)  maps  (Altshuler  et  al.  2005)  and  facilitated  by  high- 
throughput  array  technology,  has  been  highly  successful  in  part  of  this  endeavour. 
Extensive  global  scientific  work  and  expenditure  has  enabled  the  identification  of  a 
large  number  of  common  genetic  variants,  single-nucleotide  polymorphism 
(SNPs),  in  strong  and  replicable  associations  with  a  wide  range  of  common  diseases 
(see  http://www.genome.gov/gwastudies/).  However,  individually  these  factors 
only  convey  a  relatively  small  level  of  risk  and  together  do  not  come  close  to 
accounting  for  the  estimated  heritability  of  these  disorders  (Manolio  et  al.  2009). 
Additionally  the  discovery  SNPs  are  rarely  found  to  be  code  modifiers;  approxi- 
mately 85  %  of  disease-associated  GWAS  variants  being  non-coding  (Hindorff 
et  al.  2009).  Various  potential  explanations  for  this  "missing  heritable"  element 
have  been  offered,  including  methodological  factors,  such  that  GWAS  do  not 
differentiate  between  the  actual  causal  variant  and  another  variant  in  strong  LD, 
the  indirect  detection  of  rare  and  structural  variants,  and  potentially  the  identifica- 
tion of,  or  interactions  between,  stable  epigenetic  modifications  (McCarthy 
et  al.  2008).  Also  it  must  be  acknowledged  that  there  may  be  some  over-estimation 
of  these  heritability  measures  due  to  a  lack  of  accounting  for  genetic  interactions 
(epistasis)  (Zuk  et  al.  2012). 
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Fig.  13.1  The  environment  interfaces  through  the  epigenome  to  influence  genome  and  affect 
expression.  Furthermore  non-coding  transcripts  such  as  micro-,  piwi-interacting-,  small 
interfering-  and  long  non-coding-RNA  may  influence  gene  expression  by  modulation  of  other 
transcripts  and/or  the  epigenome  (Kaikkonen  et  al.  2011).  Epigenome  modifying  proteins,  such  as 
histone  methylators,  deacetylators,  etc.,  also  influence  the  structure  of  the  epigenome,  thereby 
influencing  expression  (Dawson  et  al.  2012) 


The  functional  interpretation  of  the  robustly  replicated  GWAS  variants  discov- 
ered to  date  is  the  vital  next  step  to  fully  reap  their  potential  therapeutic  and 
preventative  benefits.  Even  though  an  individual  association  only  has  a  small 
incremental  increase  in  relative  risk,  solving  the  biological  variation  that  the  allelic 
sequence  difference  causes  can  still  identify  previously  unknown  pathophysiologi- 
cal pathways.  Recent  evidence  that  many  of  these  GWAS  association  SNPs  are 
themselves  commonly  within  regulatory  DNA  has  come  via  genome-wide  DNase  I 
hypersensitivity  sites  (DHSs)  experiments.  This  method  detects  open  chromatin 
able  to  be  cut  by  this  enzyme  and  therefore  potential  functional  regions  of  the 
genome.  This  enabled  Maurano  et  al.  to  identify  increasingly  stronger  associations 
between  association  SNPs  and  DHSs  when  performed  with  the  most  disease 
appropriate  cell  types  (Maurano  et  al.  2012),  thus  indicating  the  potential 
functional  consequences  of  these  variants  when  they  are  located  in  their  correct 
pathogenic  tissues. 

Once  robust  disease-associated  genetic  variants  have  been  identified  via  GWAS, 
or  perhaps  Whole  Genome  Association  Studies  (WGAS)  (Jonsson  et  al.  2012),  the 
medical  challenge  is  to  try  to  decipher  how  these  hereditary  components  interact 
with  the  external  environment  to  control  health  outcomes.  Factors  such  as  stress, 
diet  and  toxins  may  be  translated  into  biological  effectors  on  the  genome  via 
epigenetic  changes  (Feinberg  2008;  Bell  and  Beck  2010)  (Fig.  13.1).  Therefore 
these  genetic  effects  do  not  act  in  isolation.  It  is  how  these  external  factors  may 
influence  these  disease  susceptibilities,  how  they  may  interface  with  the  genome  via 
the  epigenome  (Bell  and  Beck  2010)  and  whether  there  is  tractable  ability  to 
measure  this  effect  that  may  bring  further  breakthroughs  in  the  understanding  of 
disease  pathogenesis. 


290 


C.G.  Bell 


13.2    The  Epigenome 

There  are  greater  than  200  distinct  cell  types  within  the  human  body  (Alberts 
et  al.  2008)  and  all  these  cells  possess  the  same  genome,  barring  somatic  mutation. 
It  is  the  packaging  and  chemical  modifications  of  the  -2  m  length  of  DNA  within 
each  (Strachan  and  Read  1999)  that  influences  gene  expression  and  therefore 
enables  tissue-specific  activity.  The  components  of  this  overlying  mechanism  are 
termed  the  "epigenome".  This  includes  modifications  of  the  DNA  molecule  itself, 
as  well  as  alternates  to  the  histone  proteins  that  the  helix  winds  around,  including 
amino  acid  variants  to  the  protein  structure  and  post-translational  modifications  of 
the  appending  tails  of  these  molecules.  Also  certain  species  of  non-coding  RNA, 
such  as  micro-,  piwi-interacting-,  small  interfering-  and  long  non-coding-RNA  may 
be  considered  as  part  of  the  epigenome,  due  to  their  influence  on  gene  expression 
(Kaikkonen  et  al.  2011). 

A  steadily  expanding  list  of  epigenetic  modifications  are  now  known  to  exist 
including  currently  four  DNA  modifications:  DNA  methyl  ation  (5mC), 
hydroxymethylation  (5hmC),  5-formylcytosine  (5fC)  and  5-carboxylcytosine 
(5caC)  (Branco  et  al.  2012);  16  classes  of  histone  tail  modification  (Dawson  and 
Kouzarides  2012;  Tan  et  al.  201 1)  leading  to  more  than  200  alterations;  and  histone 
variants  such  as  H2A.Z  and  H3.3  (Conerly  et  al.  2010;  Jiang  and  Pugh  2009)  (see 
Table  13.1).  The  5-methyl  modification  of  cytosine  also  is  found  to  occur  in  RNA 

ST 

(Squires  et  al.  2012)  and  recently  a  N6-methyladenosine  (m  A)  modification  of 
RNA  was  characterised  genome-wide  (Dominissini  et  al.  2012;  Meyer  et  al.  2012), 
of  which  the  FTO  gene  functions  as  its  demethylase  (Jia  et  al.  2011). 

The  strict  definition  of  a  "true"  epigenetic  mechanism  requires  the  modification  to 
be  able  to  be  propagated  through  mitotic  replication  and  therefore  to  be  mitotically 
heritable.  The  robust  mechanism  for  this  inherence  is  well  understood  for 
DNA  methylation,  though  less  certain  for  other  modifications  (Bird  2002).  Three 
DNA  methyltransferase  enzymes  classically  act  as  maintenance  (DNMT1)  and  de 
novo  methylaters  (DNMT3A  &  B),  respectively.  The  maintenance  enzyme, 
DNMT1,  acts  by  recognising  hemi-methylated  DNA  (Margueron  and  Reinberg 
2010),  although  this  orthodox  understanding  may  in  fact  be  an  oversimplification 
with  all  three  potentially  now  thought  to  play  a  role  in  this  process  (Jones  and 
Liang  2009). 

In  order  to  investigate  the  epigenome  there  needs  to  be  clarity  as  to  how  this 
layer  of  information  differs  from  the  underlying  genome.  This  can  be  broken  down 
into  four  principle  facets.  Firstly,  it  is  by  function  tissue-specific  (Fernandez 
et  al.  2012;  Bock  et  al.  2012).  Secondly,  it  is  not  static  but  changeable  over  time, 
due  to  developmental  programming  alterations  (Ferguson- Smith  2011),  random 
drift  (Feil  and  Fraga  2011),  age-related  modifications  within  specific  loci 
(Teschendorff  et  al.  2010;  Rakyan  et  al.  2010)  or  environmental  influence  (Feinberg 
2007).  Thirdly,  it  can  vary  between  homologous  chromosomes,  in  either  a  parent- 
of-origin-specific  manner  that  occurs  within  imprinted  loci,  required  for  normal 
development  such  as  the  Prader-Willi  locus  (Ferguson- Smith  2011),  or  other 
non-imprinted  allele- specific  means.  This  will  manifest  as  allele- specific  methylation 
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Table  13.1  Epigenetic  modifications  and  marks  (Dawson  and  Kouzarides  2012;  Conerly 
et  al.  2010;  Jiang  and  Pugh  2009;  Squires  et  al.  2012;  Dominissini  et  al.  2012;  Meyer  et  al.  2012) 


urs/\  mocuiicaLions 

j  -ivieinyicyiosine 

JUIL 

5-Hydroxymethylcytosine 

jnmL 

j-r  ormy  icy  to  sine 

j  -L^arooxyiation 

jcaL. 

Histone  tail 

Acetylation 

K-ac 

modifications 

Methylation  (lysine) 

K-mel,  K-me2,  K-me3 

Methylation  (arginine) 

R-mel,  R-me2s,  R-me2a 

Phosphorylation 

S-ph,  T-ph 

(serine  and  threonine) 

Phosphorylation  (tyrosine) 

Y-ph 

T  T1       *                  *  A.        1  A.* 

Ubiquity  lation 

K-ub 

Sumoylation 

K-su 

ADP  nbosylation 

E-ar 

Deimination 

R 

Proline  isomerisation 

P-C1S 

Crotonylation 

K-cr 

r  ropionyiation 

Jv-pr 

Rutvrvlation 

K-bu 

Formvlation 

K-fo 

Hyroxylation 

Y-oh 

O-GlcNAcylation 

S-GlcNAc;  T-GlcNAc 

(serine  and  threonine) 

Histone  variants 

H2A.Z 

H2A.Z 

H3.3 

H3.3 

RNA  modifications 

5-Methylcytosine 

5mC 

N6-methyladenosine 

m6A 

(ASM)  or  allele- specific  histone  modifications  (ASHMs).  Finally  the  epigenome  can 
be  influenced  by  genetic  polymorphism  (Kerkel  et  al.  2008;  Shoemaker  et  al.  2010). 
This  can  drive  differences  between  individuals  or  may  also  contribute  to  the 
non-imprinting  related  allele- specific  variation  between  homologous  chromosomes. 

The  tissue-specific  epigenome  of  a  particular  cell  type  of  an  organ,  for  example 
the  liver  or  hepatocyte  epigenome,  will  be  more  different  from  a  skin  cell  within  an 
individual  than  two  hepatocyte  epigenomes  between  two  individuals,  i.e.  intra- 
individual  inter-tissue  variation  exceeds  inter-individual  variation  in  any  given 
tissue  (Davies  et  al.  2012).  In  fact  the  tissue-specific  epigenome  even  between 
two  related  species,  such  as  human  and  chimpanzee,  will  be  more  similar  than  when 
compared  with  a  different  tissue  or  cell  type  within  the  same  individual  (Pai 
et  al.  2011).  However  these  tissue-specific  templates  will  not  be  precisely  identical 
as  genetic  variation  and  other  potential  environmental  influences  lead  to  subtle 
modifications  between  people  (Brena  et  al.  2006).  It  is  this  fraction  of  variability 
upon  the  underlying  rigid  tissue-specific  design  that  we  are  most  interested  in  for 
disease  studies,  and  hope  to  dissect  within  this  the  proportion  that  can  be  attributed 
to  genetic  variability,  and  that  which  is  arising  from  environmental  influences. 
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Additionally  if  we  wish  to  perform  studies  on  the  epigenome,  we  need  to  be  well 
aware  that  there  is  high  potential  for  random  artefacts  to  be  found  if  cell  lines  are 
used  as  the  analyte,  due  to  the  stochastic  changes  that  occur  due  to  artificial 
environment  and  procedures  experienced  by  these  cells  (Brennan  et  al.  2009; 
Aberg  et  al.  2012).  Similarly  we  need  to  recognise  that  whole  blood  derived 
DNA  is  from  a  mixture  of  blood  cell  types,  so  the  resultant  epigenetic  findings 
will  be  a  proportional  representative  epigenome  of  all  of  the  constitutive  blood  type 
cells  present. 


13.3    DNA  Methylome 

DNA  methylation,  the  addition  of  a  methyl  group  to  the  5 -carbon  of  cytosine,  is 
currently  the  most  well-studied  epigenetic  modification.  Within  the  human  genome 
this  occurs  almost  exclusively  in  differentiated  cells  in  the  context  of  a  CpG 
dinucleotide:  cytosine  followed  by  guanine  from  5f  to  3f  in  the  DNA  strand  via 
the  phosphodiester  bond.  Normal  embryonic  development  requires  the  ability  to 
methylate  these  cytosines,  as  this  mechanism  is  involved  in  genomic  imprinting, 
X  chromosome  inactivation,  and  repression  of  transposable  elements  differentiation 
(Robertson  2005).  There  are  approximately  28  million  CpG  dinucleotides  within 
the  human  genome  and  around  70-80  %  of  these  CpGs  are  methylated  in  mammals 
(Lister  et  al.  2009).  Those  CpG  dinucleotides  that  remain  unmethylated  predomi- 
nately occur  in  clusters,  termed  "CpG  islands"  (CGI),  which  sparsely  punctuate  the 
genome.  These  islands  number  27,718  or  22,374  via  UCSC  or  Ensembl  databases, 
respectively,  with  discrepancy  in  number  due  to  differing  repeat  masking  and  size 
requirements  in  the  respective  prediction  algorithms  (Illing worth  and  Bird  2009). 
CGI  account  for  ~7  %  of  all  genomic  CpGs.  These  clusters  when  unmethylated 
recruit  CpG  binding  proteins,  chromatin  modifying  enzymes,  such  as  Cfpl  and 
KDM2A,  leading  to  the  modification  of  histone  tails  (Blackledge  et  al.  2010; 
Thomson  et  al.  2010)  and  permissive  chromatin  formation.  Therefore  CpG  islands 
can  be  described  as  a  "platform"  on  which  gene  transcription  is  able  to  occur 
(Blackledge  and  Klose  201 1).  Approximately  half  of  CGI  are  found  at  the  promoter 
region  of  genes,  with  the  remainder  evenly  split  between  intergenic  and  intragenic 
locations  (Illing worth  et  al.  2010).  Of  these  non-promoter  CGI,  those  located 
elsewhere  within  genes  may  potentially  act  as  alternate  isoform  promoters 
(Illingworth  et  al.  2008;  Maunakea  et  al.  2010)  and  some  of  those  between  genes 
may  be  involved  in  currently  unidentified  non-coding  transcripts. 

The  islands  stand  out  as  the  vast  majority  of  the  genome  is  depleted  of  CpGs 
dinucleotides.  There  are  in  fact  ~120  million  of  the  exact  reverse  dinucleotide  GpC. 
This  reduction  is  due  to  the  hypermutability  of  methylated  cytosines  (Duncan  and 
Miller  1980;  Roach  et  al.  2010)  which  are  easily  deaminated  to  uracil  and  subse- 
quently converted  to  thymine.  Therefore  where  cytosine  methylation  occurs  within 
the  germ-line  cells  of  sperm  or  egg,  there  will  be  a  high  chance  of  mutational  loss  of 
these  CpGs  over  evolutionary  time.  These  dinucleotides  will  be  converted  to  TpG 
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or  CpA  depending  on  the  strand  of  the  cytosine  deaminated.  This  is  well 
demonstrated  by  the  fact  that  18.2  %  of  human  pathological  lesions  were 
documented  by  Cooper  et  al.  to  be  C  to  T  or  G  to  A  transitions  located  within  a 
CpG  context  (Cooper  et  al.  2010);  transitions  at  CpGs  occur  at  >~13  times 
non-CpG  transitions,  and  so  play  a  significant  role  in  SNP  occurrence 
(Li  et  al.  2009).  This  CpG- SNP  variability  has  been  shown  to  have  considerable 
influence  on  the  surrounding  methylome  and  therefore  strongly  contributes  to  ASM 
variability  (Shoemaker  et  al.  2010). 

Dramatic  DNA  methylome  changes  occur  in  cancer  genomes,  such  as  a  global 
hypomethylation  with  concurrent  de-repression  of  pathogenic  repetitive  elements 
and  locus  hypermethylation  of  tumour  suppressor  genes  (Esteller  2007).  However 
with  the  increased  resolution  in  methylome  analysis  now  possible,  initially  with 
array  technology  but  increasingly  with  second-generation  sequencing,  more  subtle 
signatures  have  been  identified.  This  includes  findings  that  more  differentially 
methylated  regions  (DMRs)  are  likely  to  be  seen  in  regions  surrounding  islands 
approximately  2  kb  up-  and  downstream,  termed  "CpG  Island  shores"  than  within 
the  CpG  islands  themselves  (Irizarry  et  al.  2009).  These  regions  were  found  to 
contain  the  most  significant  tissue-,  cancer-  and  reprogramming-specific  changes 
(Doi  et  al.  2009)  and  in  malignant  tissue  were  shown  to  be  contributed  to  by  the  loss 
of  the  strictly  delineated  methylation  change,  or  boundary,  at  island  borders 
(Hansen  et  al.  2011).  High  resolution  analysis  in  human  and  chimpanzee  sperm 
has  identified  islands  to  have  wider  regions  of  hypomethylation,  stretching  further 
into  their  shore  regions,  within  these  germ-line  cells  (Molaro  et  al.  2011).  These 
larger  hypomethylated  islands  and  shores  regions  are  shown  to  recede  to  different 
degrees  during  differentiation  and  this  tidal  nature  of  methylation  around  islands 
can  also  been  seen  clearly  in  the  process  of  haematopoietic  delineation  (Hodges 
et  al.  201 1).  The  first  human  bisulphite  sequencing  methylomes,  published  by  Lister 
et  al.,  performed  in  embryonic  stem  cells  and  fibroblasts  also  identified  intermediate 
methylation  regions,  defined  as  partially  methylated  domains  (PMDs),  as  being 
disproportionally  lost  in  the  differentiation  process  (Lister  et  al.  2009). 

DNA  methylation  modification  has  location-specific  functional  effects 
depending  upon  where  within  the  DNA  code  it  resides,  i.e.,  repressive  within 
CpG  Islands  and  CpG  Island  Shores,  activating  within  Gene  Bodies  (Hellman  and 
Chess  2007),  or  potentially  repressive  within  transcription  factor  binding  sites 
(TFBSs).  Whilst  clusters  of  CpG  dinucleotides  may  act  in  concert,  the  CpG 
dinucleotide  can  also  be  thought  of  as  a  genomic  signalling  molecule  in  its  own 
right  (Bird  2011).  The  presence  or  absence  of  an  individual  CpG  within  TFBSs 
facilitating  methylation  may  influence  the  variability  of  binding,  as  seen  in  recent 
CTCF  data  (Wang  et  al.  2012).  In  this  ENCODE  study  41  %  of  the  variability  of 
CTCF  occupancy  was  associated  with  differential  DNA  methylation.  Sites  that 
showed  more  prevalent  methylation-associated  variability  were  enriched  for  the 
genetic  occurrence  of  a  CpG  dinucleotide  at  two  loci  within  the  CTCF  motif 
(Fig.  13.2).  Their  presence  allowed  methylation  variability  to  influence  occupancy 
by  ~2  times  compared  to  motifs  lacking  these  CpGs  and  were  termed  CTCF 
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Fig.  13.2  Figure  from  Wang  et  al.  (2012)  Genome  Research  2012,  Cold  Spring  Harbor  Labora- 
tory Press.  Those  CTCF  sites  in  which  methylation  inversely  influences  probability  of  binding  are 
enriched  for  the  occurrence  of  two  CpG  within  the  canonical  CTCF  binding  motif.  F-axis  is  the 
frequency  of  a  CpG  dinucleotide,  at  positions  within  the  CTCF  motif  displayed  on  X-axis,  relative 
to  those  sites  where  methylation  is  associated  (red)  or  not  associated  (grey)  with  occupancy 
changes.  At  positions  1  and  11,  there  is  a  2.2-  and  1.8-fold  CpG  presence  enrichment,  respectively. 
Twenty-nine  percent  of  CTCF  motifs  genome-wide  contain  a  CpG  at  one  or  both  of  these  positions 

"susceptible"  sites  (Wang  et  al.  2012).  Lack  of  CpGs  at  these  sites  decoupled  CTCF 
occupancy  from  differential  methylation. 

The  variable  methylation  of  the  binding  sites  of  this  insulator  protein,  CTCF,  has 
also  been  shown  to  contribute  to  alternative  pre-mRNA  splicing.  When  located 
within  1  kb  downstream  of  exons,  bound  CTCF  promotes  RNA  polymerase  II 
pausing  and  subsequent  enhanced  inclusion  of  this  nearby  exon.  DNA  methylation 
within  these  motifs  inhibits  CTCF  binding  and  subsequently  has  been  identified  to 
result  in  increased  exon  exclusion  (Shukla  et  al.  2011). 

The  role  of  DNA  methylation  as  an  informative  cellular  switch  was  strongly 
demonstrated  in  a  study  from  Bock  et  al.,  also  in  blood  cell  differentiation,  that 
showed  a  two-tier  epigenetic  control,  whereby  within  lymphoid  cells,  both  the 
genetic  loci  and  the  binding  sites  of  myeloid  specification  transcription  factors 
were  methylated  (Bock  et  al.  2012).  Therefore  even  spurious  transcription  within 
lymphoid  cells  would  have  a  limited  effect.  Also  this  paper  showed  that  DNA 
methylation  differences  were  five  times  more  frequent  between  rather  than  within 
lineages,  compared  to  two  times  for  expression  (Bock  et  al.  2012).  DNA  methyla- 
tion was  able  to  accurately  depict  cellular  identity,  with  the  most  powerful  loci 
being  those  with  intermediate  methylation  levels  which  include,  but  are  not  exclu- 
sively, CGI  shore  regions  (Bock  et  al.  2012). 

However  there  is  currently  debate  as  to  whether  this  observed  DNA  methylation 
occurring  within  regions  such  as  TFBSs  is  an  active  process  that  then  precludes 
binding,  or  whether  it  occurs  passively  at  regions  as  a  consequence  of  lack  of  this 
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Fig.  13.3  CpG  density  across  the  genome.  CpG  islands  possess  density  of  >  12  %  up  to  over  30  % 
CpG  density  (i.e.,  100  %  CpG  =  50  CpGs  in  100  bp).  CpG  island  shores  (Irizarry  et  al.  2009)  form 
an  intermediate  level  of  density  beside  islands  before  dropping  back  to  genome  baseline  levels  of 
around  ~2  %.  Low  methylation  regions  (LMRs)  (Stadler  et  al.  2011)  are  predominately  promoter 
distal  regions  that  rise  above  this  genomic  ocean  baseline 

interaction.  Stadler  et  al.  in  a  mouse  methylome  experiment  identified  distal 
regulatory  regions  with  enhancer  chromatin  signatures  that  had  above  genome 
baseline  CpG  densities  of  ~2.5-5  %,  included  about  4  %  of  all  mouse  CpGs,  and 
possessed  a  low  methylation,  averaging  30  %  (Stadler  et  al.  2011).  These  loci  were 
termed  low  methylation  regions  (LMRs)  (Fig.  13.3).  They  observed  the  binding  of 
cell  type-specific  transcription  factors  to  these  regions  led  to  loss  of  methylation. 
Passive  deposition  within  cell-specific  TFBSs  after  their  vacation  was  also 
supported  by  Thurman  et  al.  within  DNAase  I  hypersensitivity  sites  (DHS) 
(Thurman  et  al.  2012). 

Although  even  if  methylome  variability  within  these  binding  sites  is  passive,  it 
still  remains  informative  of  stable  DNA-protein  interactions.  Supporting  this  is 
recent  data  on  TFBS  "footprints"  that  were  identified  within  the  larger  DHS  sites,  as 
their  binding  protected  this  small  motif  region  from  DNase  I  cleavage  (Neph 
et  al.  2012a).  A  connection  between  regulatory  factor  occupancy  and  base- 
resolution  DNA  methylation  status  was  shown,  with  decreased  methylation  within 
DHS  sites,  but  moreover  within  these  regions,  it  is  significantly  less  within  the  motif 
"footprints"  than  outside  them  (Neph  et  al.  2012a)  (Fig.  13.4).  Measures  of  DNA 
methylation  additionally  have  the  advantage  of  a  robust  replication  system  and 
slower  turnover  compared  to  the  potential  temporality  of  expression.  Equally  we 
need  to  consider  how  a  completely  passive  role  would  reconcile  with  the  long- 
standing evidence  of  DNA  methylation  repressing  transcription  in  in  vivo  transcrip- 
tional silencing  (Siegfried  et  al.  1999). 

Genetic  influences  on  the  methylome  can  be  split  into  those  caused  directly  by 
the  CpG  dinucleotide  and  other  polymorphic  cis-  or  fraws-effectors  that  influence 
the  methylation  machinery.  Density  change  within  CGIs  affects  their  dynamic 
ability  to  methylate,  with  the  vast  majority  of  high-density  CGI  unmethylated 
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Fig.  13.4  Figure  from  Neph 
et  al.  (2012a).  Reprinted  by 
permission  from  Macmillan 
Publishers  Ltd:  Nature 
489:7414  ©  2012.  Significant 
methylation  variability  from 
outside  DNase  I 
hypersensitivity  sites  to 
within,  and  within  between 
transcription  factor  footprints 
of  binding  and  outside  this 
footprint,  (data  from  IMR90 
cells,  P  <  2.2  x  10"16, 
Mann-Whitney  f/-test) 
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irrespective  of  their  transcriptional  state,  and  low-density  CGI  the  preferable 
template  for  tissue-specific  methylation  (Weber  et  al.  2007;  Eckhardt  et  al.  2006). 
In  addition  within  CGI,  proposed  TFBSs  influence  the  likelihood  of  methylation, 
which  are  termed  methylation  determining  regions  (MDRs)  (Lienert  et  al.  2011). 

Allelic  SNP  variation  may  be  c/s-effectors  within  regulatory  motifs  enabling 
methylation  (Kerkel  et  al.  2008;  Tycko  2010;  Schalkwyk  et  al.  2010).  This  ASM 
may  contribute  to  and  highlight  potential  allele- specific  expression  (ASE)  patterns 
(Bell  and  Beck  2009).  Quantitative  trait  loci  (QTL)  for  DNA  CpG  methylation  have 
been  identified  in  the  brain  tissue  (Zhang  et  al.  2010)  with  cis  peaks  only  45  bp  from 
the  focal  CpG  site  and  were  more  likely  to  occur  for  non-CGI  CpGs  (Gibbs 
et  al.  2010).  Fang  et  al.  identified  in  humans  that  regions  of  ASM  found  in  multiple 
cell  types  were  frequently  the  promoters  of  non-coding  RNA  (Fang  et  al.  2012). 
Potential  global  ^ra^-effectors  on  methylation  have  also  been  identified,  i.e., 
rsl0876043  in  DIP2B  (Bell  et  al.  2011). 

Environmental  factors  including  nutrition,  behaviour,  stress,  toxins,  as  well  as 
stochastic  factors,  have  also  been  found  to  affect  the  epigenome  (Faulk  and  Dolinoy 
2011).  Therefore  because  of  both  DNA  methylation's  replication  stability,  but  also 
its  potential  plasticity,  it  is  proposed  as  a  biomarker  of  quantitative  lifetime  environ- 
mental exposure  or  accrued  pathogenic  alteration  (Feinberg  2007;  Bock  2012). 


13.4    Variation  in  the  Methylome 


Differences  identified  in  methylomes  may  depend  upon  methodology  implemented, 
but  terminology  needs  to  be  accurately  understood  in  order  to  clearly  appreciate  the 
significance  of  findings.  Base-resolution  analysis  will  enable  the  identification  of 
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Fig.  13.5  UCSC  browser  with  mock  data  showing  methylation  variability.  Yellow  CpG  <  20  % 
and  blue  CpG  >  80  %  methylated.  CpG  methylation  data  can  be  compared  (here  between  Subject 
1  (SI)  and  Subject  2  (S2))  and  variation  can  be  isolated  to  a  single  CpG  or  methylation  variable 
position  (MVP)  or  be  a  regional  difference  spread  over  a  number  of  CpGs — a  differentiated 
methylated  region  (DMR).  These  MVPs  or  DMRs  can  then  be  assessed  to  see  if  they  reside  within 
an  annotated  genomic  feature,  i.e.,  CpG  islands,  CGI  shore,  transcription  factor  binding  site, 
chromatin  defined  state  (ChromHMM  (Ernst  and  Kellis  2012)  see  Table  13.2),  as  shown  here 
within  an  active  promoter  region,  as  defined  by  ChromHMM  in  multiple  cell  types,  with  potential 
influence  on  a  number  of  transcription  factor  binding  sites 


methylation  variable  positions  (MVPs)  which  represent  variation  at  individual 
cytosine.  DMRs  indicate  a  regional  difference,  this  may  be  from  comparison  with 
an  affinity-based  method  such  as  MeDIP,  a  number  of  contiguous  CpGs,  regional- 
sized  window,  or  within  an  annotated  genomic  feature  (such  as  CGI,  CGI  Shore, 
TFBS  or  Enhancer)  (Fig.  13.5). 

Differences  between  homologous  alleles  may  be  defined,  by  methylation  varia- 
tion at  an  exact  individual  cytosine,  as  ASM.  Larger  regions  may  be  termed 
haplotype- specific  methylation  (HSM)  (Bell  2011),  which  may  be  driven  geneti- 
cally by  the  co-ordinated  phase  of  CpG-SNPs  or  include  ds-genetic  as  well  as  pure 
epigenetic  differences  and  form  a  "hepitype"  (Murrell  et  al.  2005). 


13.5    EWAS:  Epigenome-Wide  Association  Studies 

The  advent  of  high-throughput  DNA  methylation  arrays,  such  as  the  Illumina 
Infinium  450  k  which  interrogates  DNA  methylation  at  over  480,000  individual 
CpG  sites  (Bibikova  et  al.  2011;  Dedeurwaerder  et  al.  2011),  enables  large-scale 
epigenomic  studies  to  be  performed.  This  technology  uses  bisulphite  conversion  to 
render  the  cytosine  at  CpG  loci  into  a  pseudo-SNP  depending  on  its  methylation 
status,  thereafter  comparable  Illumina  SNP  typing  technology  is  implemented. 
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The  undoubted  success  of  GWAS  has  unfortunately  led  many  to  hopefully  assume 
that  the  same  study  design  methods  would  be  equally  successful  in  the  Epigenome- 
Wide  Association  Study  (EWAS)  context,  were  associations  are  tested  for  between 
a  phenotype  and  genome- wide  epigenetic  marks.  However  the  principle  differences 
between  the  genome  and  epigenome,  as  previously  discussed,  all  contribute  to 
make  case-control  population-based  studies  in  whole  blood-derived  DNA  a  less 
optimal  design  than  in  GWAS. 

The  first  EWAS  studies  performed  in  common  diseases,  including  investigations 
in  systemic  lupus  erythematosus  (Javierre  et  al.  2010),  type  1  diabetes  nephropathy 
(Bell  et  al.  2010a),  autism  spectrum  disorders  (Nguyen  et  al.  2010),  major  psychosis 
(Mill  et  al.  2008)  and  body  mass  index  (BMI)  (Feinberg  et  al.  2010)  were  able  to 
identify  methylation  changes,  although  uniformly  very  small.  However  further 
external  replication  will  definitely  be  required  and  we  need  to  be  very  conscious 
of  the  "winner's  curse"  of  many  genetic  association  studies,  prior  to  the  advent  of 
rigorous  GWAS  methodology,  that  were  subsequently  unable  to  be  replicated  (Bell 
et  al.  2007).  At  present  we  do  not  have  a  definitive  and  mature  methodology  for 
EWAS  and  whether  the  current  technological  advances  will  be  sufficient,  or  if 
further  will  be  required  for  a  significant  breakthrough  to  be  made,  cannot  yet  be 
stated  (Heijmans  and  Mill  2012). 

The  largest  effects  that  have  been  identified  to  date  have  been  due  to  the  extreme 
environmental  influence  of  tobacco  smoking  on  the  methyl ome.  An  early  study 
using  the  previous  Illumina  27k  array  performed  a  smoking  EWAS  using  peripheral 
blood  and  identified  a  -12  %  difference  in  methylation  in  a  CpG,  cg03636183, 
located  in  F2RL3  (coagulation  factor  II  (thrombin)  receptor-like  3).  This  was  able 
to  be  detected  in  small  number  of  65  heavy  smokers  (interquartile  methylation 
range  78-88  %)  and  56  non-smokers  (94-96  %),  with  replication  in  an  additional 
316  independent  samples  and  therefore  was  small  by  GWAS  scale  comparisons 
(Breitling  et  al.  2011).  Obvious  significant  potential  confounders  exist;  including 
genetic  heterogeneity  and  mixed  cell  type-derived  DNA.  In  a  later  study  now  with 
the  450  k  array  in  1,062  new-born  cord  blood  samples  from  the  Norwegian  Mother 
and  Child  Cohort  Study  (MoBa)  an  association  with  plasma  cotinine  levels,  a 
biomarker  of  tobacco  intake,  by  mothers  during  pregnancy  and  median  methylation 
levels  at  26  CpG  located  in  ten  genes  was  found  (Joubert  et  al.  2012).  The  higher 
resolution  analysis  enabled  more  convincing  clustered  CpG  evidence  to  be 
identified,  with  three  distinct  loci  with  at  least  four  CpGs,  having  the  same  direc- 
tional change.  Replication  in  offspring  cord  blood  samples  was  surprisingly  possi- 
ble in  only  18  smoking  exposed  versus  18  non-smoking  pregnancies,  which  all  had 
the  small  directional  change  in  methylation  and  21  of  the  26  CpGs  were  significant 
to  the  level  of  P  <  0.05  and  5  yet  still  to  post-Bonferonni  correction  level.  The  aryl 
hydrocarbon  (AhR)  signalling  pathway  known  to  be  involved  in  the  detoxification 
of  tobacco  smoke  was  highlighted  by  two  genes,  AHRR  and  CYP1A1,  by  eight 
implicated  CpGs  in  this  in  utero  exposure.  Furthermore  this  study  did  individually 
support  the  previous  cg03636183  F2RL3  finding  at  a  P  <  0.05  level.  Also  to 
exclude  major  cell  type  variation  contributing  to  this  result,  the  replication  samples 
were  separated  into  the  major  cell  types  of  mononuclear  and  polymorphonuclear 
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cells,  and  median  methylation  variation  between  cell  type  was  an  order  of  magni- 
tude less  than  seen  for  maternal  smoking.  One  of  the  CpGs,  eg  148 17490,  in  AHRR 
was  also  identified  to  have  the  same  directional  significant  lower  methylation  in  an 
EWAS  in  adult  smokers  (Monick  et  al.  2012).  Additionally  that  finding  was 
replicated  in  both  B  lymphoblastoid  cells  and  also  alveolar  macrophage  cells, 
therefore  the  initial  finding  had  not  been  confounded  by  cell  type  mixtures.  More- 
over this  AHRR  finding  (cg05575921)  has  now  also  been  replicated  in  youths  with  a 
stepwise  effect  even  from  low  levels  of  smoking  (Philibert  et  al.  2012).  These  initial 
successes  in  smoking  EWAS  indicate  potentially  that  if  the  effect  is  strong  enough 
confounding  factors  may  be  overcome,  but  this  may  also  reflect  what  an  extreme 
environmental  influence  this  carcinogenic  mixture  is. 

Current  practical  reasons  limit  high-throughput  epigenomic  analysis  to  the  DNA 
methylation  mark  (Heijmans  and  Mill  2012)  although  this  may  rapidly  change. 
Recognisable  study  design  modifications  to  improve  EWAS  power  include  reduc- 
ing the  genetic  influence  by  using  monozygotic  twins  that  are  discordant  for  the 
disease  studied  (Bell  and  Spector  2012),  if  these  are  at  all  available,  or  longitudinal 
studies  over  time  within  the  same  individuals,  which  again  are  rare  resources 
(Rakyan  et  al.  2011)  (see  also  Chap.  14).  Comparison  between  monozygotic  and 
dizygotic  twins  can  be  powerful  in  delineating  genetic  and  potential  environmental 
causes  (Bell  and  Spector  2012).  Further  difficulty  comes  in  that  due  to  tissue 
specificity,  it  is  obviously  highly  desirable  to  use  the  pathogenic  tissue  involved 
in  the  disease  process.  Although  a  recent  study  from  Davies  et  al.  showed  the 
possibility  that  some  of  the  variation  between  individuals  was  reflected  across  both 
brain  and  blood,  indicating  that  the  use  of  surrogate  readily  accessible  peripheral 
tissues  may  have  limited  utility  at  a  subset  of  loci  (Davies  et  al.  2012).  Variation 
identified  across  all  soma  tissues  could  stem  from  genetic  influence,  extremely  early 
induced  developmental  epigenetic  modulation  subsequently  maintained  through  all 
or  most  cells,  such  as  putative  human  metastable  epialleles  (Waterland  et  al.  2010), 
or  environmental  influences  that  lead  to  signatures  across  multiple  cells  types  whilst 
only  leading  to  pathogenicity  in  a  specific  tissue.  With  the  majority  of  epigenome 
being  established  during  embryogenesis  and  early  foetal  development  (Squires 
et  al.  2012),  this  makes  this  a  critical  window  for  potential  environmental  effects 
(Gluckman  et  al.  2008). 

The  non- static  nature  of  the  epigenome  also  needs  to  be  considered  for  potential 
confounding  effects  on  age.  As  mentioned  specific  loci  have  been  implicated  in 
ageing,  so  correction  within  these  identified  regions  in  the  promoters  of  polycomb 
group  proteins  target  genes  and  bivalent  domain  loci  will  be  required  (Teschendorff 
et  al.  2010;  Rakyan  et  al.  2010).  Random  drift  over  an  adult  lifetime  (Heyn 
et  al.  2012)  and  developmental  changes  that  also  continue  into  childhood  need  to 
be  accounted  for.  For  example  changes  in  monocytes  in  the  first  5  years  of  life,  with 
possible  link  to  immune  maturation,  having  been  identified  (Martino  et  al.  2011). 

Another  potential  confounder  is  the  use  of  heterogeneous  DNA  from  whole 
blood,  although  the  simple  integration  with  blood  cell  count  data  may  enable 
correction  for  dramatic  change  due  to  effects  such  as  infection  (Heijmans  and 
Mill  2012).  In  fact  extrapolation  of  the  constitutional  cellular  fractions  from  only 
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methylome  data  has  been  recently  described  (Houseman  et  al.  2012)  and  will 
only  become  more  accurate  with  more  detailed  data.  Finally,  if  robust  changes 
are  identified,  longitudinal  disentanglement  is  required  to  determine  if  they  are  in 
fact  causative  in  the  pathology,  or  an  effect  of  the  disease,  or  even  the  treatment  or 
medication  regime  given  (Heijmans  et  al.  2009).  Validation  by  an  alternate  method 
remains  desirable  although  currently  bisulphite  conversion,  as  implemented  within 
the  Illumina  arrays  protocol  is  the  gold  standard.  This  will  quickly  change  if  the 
possibilities  of  third-generation  sequencers  to  directly  read  DNA  modifications 
come  to  fruition  (Flusberg  et  al.  2010;  Clarke  et  al.  2009). 

Epigenetics  may  play  a  causative  role  in  complex  diseases,  but  it  will  not  be 
contributing  to  "missing  heritability"  (Slatkin  2009)  unless  it  has  a  facultative  or 
obligatory  relationship  with  genetics  (Richards  2006).  Therefore  amalgamation  of 
these  analyses  may  enable  further  insight  into  these  common  diseases.  A  strategy  is 
to  utilise  the  considerable  strength  of  GWAS  and  look  for  subtle  epigenomic 
variability  within  the  regions  robustly  confirmed  to  have  some  contribution  to  the 
disease  process  (Birney  2011;  Bell  2011).  Identified  epigenetic  factors  may  be 
ratcheting-up  or  -down  the  underlying  genetic  influence.  If  these  methylation 
changes,  for  instance,  are  an  environmentally  driven  modification,  this  would  be 
a  mechanism  for  genetic  and  lifestyle  factors  to  combine  their  influence.  By 
integrating  epigenomic  information,  including  DNA  methylation  and  chromatin 
data,  within  regions  of  genetic  susceptibility,  this  may  enable  insight  into  the 
functional  mechanism,  modes  of  inheritance  and  potential  environmental  modula- 
tion (Birney  2011). 


13.6    Integration  of  Epigenomic  and  Genomic  Data 

Increased  or  attenuated  effects  of  common  genetic  predisposition  have  been 
identified  due  to  environmental  stimuli.  These  modulators  are  likely  to  function 
through  epigenomic  pathways.  Examples  include  the  FTO  locus,  the  strongest 
common  susceptibility  variant  associated  with  BMI  (Dina  et  al.  2007;  Fray  ling 
et  al.  2007).  Like  most  GWAS  association  hits,  the  identified  FTO  association  SNPs 
lie  within  a  non-coding  region,  in  this  case  in  the  first  intron  of  the  gene.  In  a  large 
study  of  over  200,000  adults  from  Kilpelainen  et  al.  the  effects  of  the  susceptibility 
allele  were  lessened  by  physical  activity,  reducing  the  odds  of  obesity  from  1.3  per 
allele  to  1.22  (Kilpelainen  et  al.  2011).  Additionally  in  a  recent  study  from 
Qi  et  al.  the  genetic  predisposition  for  BMI  was  calculated  from  32  known 
associated  SNPs  and  in  over  30,000  individuals  the  genetic  association  was  found 
to  be  more  pronounced  in  those  with  higher  intake  of  sugar- sweetened  beverages 
than  those  with  a  lower  level  (Qi  et  al.  2012). 

One  method  to  integrate  methylome  and  genomic  data  is  termed  HSM  analy- 
sis, whereby  DNA  methylation  is  measured  within  the  LD  block  of  a  disease 
association  SNP  (Bell  2011)  (Fig.  13.6).  The  methylation  levels  are  then  analysed 
not  by  case  versus  control,  but  by  risk  haplotype  status  and  a  linear  relationship 
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Fig.  13.6  Haplotype- specific  methylation  analysis  (from  Bell  (201 1),  Adapted  from  Personalized 
Medicine,  May  2011,  Vol.  8,  No.  3,  Pages  243-251  with  permission  of  Future  Medicine  Ltd). 
DNA  methylation  is  calculated  across  the  linkage  disequilibrium  (LD)  block  of  disease-associated 
SNP.  (a)  Illustration  of  a  GWAS  peak  locus  with  the  X-axis  representing  —  log10  P-values  for  SNPs 
(points:  red  above  significance  threshold;  blue  non- significant)  and  local  recombination  rate  as 
blue  lines  (cM/Mb)  with  peaks  demarking  the  LD  blocks  represented  (c)  in  Haploview  (Barrett 
et  al.  2005).  (b)  Figure  of  the  average  DNA  methylation  identified  across  the  association  LD 
block  locus 

between  the  three  genotypic  groupings  (homozygote  risk,  heterozygote  and 
homozygote  non-risk)  is  looked  for.  This  can  be  investigated  for  across  the  entire 
LD  block  for  potentially  strong  and  co-ordinated  effects  or  a  sliding  window 
analysis  with  variable- sized  windows  can  be  performed  to  isolate  potentially 
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stronger  effects  and  functional  regions.  The  power  of  this  integrative  approach  is 
that  it  uses  the  strength  of  the  large-scale  GWAS  studies  performed  in  1,000's  of 
samples,  and  then  looks  for  subtlety  between  these  alleles  according  to  their 
epigenetic  state.  This  analysis  was  performed  in  all  then  known  association 
regions  for  Type  2  Diabetes  in  a  study  in  2010  (Bell  et  al.  2010b).  A  significant 
relationship  between  methylation  state  and  the  FTO  locus  was  identified,  whereby 
the  obesity  susceptibility  haplotype  possessed  significantly  higher  levels  of  meth- 
ylation. This  was  localised  within  a  7.7  kb  region  and  the  epigenomic  architecture 
here  revealed  it  to  possess  a  chromatin  histone  H3K4mel  enhancer  signature  in 
tissues  including  skeletal  muscle.  The  most  extreme  difference  of  ~10  %  methyl- 
ation between  homozygote  risk  and  non-risk  carriers  was  then  isolated  in  a 
narrow  peak  of  0.9  kb.  This  was  found  to  be  strongly  contributed  to  by  the 
co-ordinated  phase  of  CpG-creating  SNPs  across  the  risk  haplotype  within  this 
region,  tagged  by  the  CpG-SNP  rs7202116.  The  utility  of  this  approach  can  be 
shown  in  that  statistically  significant  results  were  identified,  across  a  47  kb  LD 
block,  using  just  the  methylation  data  of  60  individuals.  Thousands  of  individuals 
would  have  been  required  to  identify  this  as  a  disease-related  case  versus  control 
DMR  without  input  of  the  genotype  information. 

Subsequently  Yang  et  al.  performed  a  meta-analysis  of  approximately  170,000 
samples  to  try  to  identify  SNPs  associated  with  a  trait's  phenotypic  variability 
(Yang  et  al.  2012).  This  identified  that  the  methylatable  CpG-SNP  rs7202116, 
within  the  obesity-related  FTO  LD  block  locus,  was  strongly  associated  with  BMI 
phenotypic  variance  (Yang  et  al.  2012).  This  was  the  "G"  allele  of  this  SNP  that 
tagged  the  higher  region  of  genetically  driven  methylation  difference  in  the  study 
above  and  itself  created  a  CpG  site.  DNA  methylation  was  postulated  to  be  the 
possible  functional  cause  (Yang  et  al.  2012).  Therefore  a  potential  hypothesis  to 
connect  these  two  findings  would  be  that  the  increased  variability  was  being 
facilitated  by  the  presence  of  a  CpG,  or  number  of  CpGs,  within  a  particular 
functional  region  such  as  a  TFBS.  As  has  been  proposed  in  the  CTCF  binding 
data  from  Wang  et  al.  (2012),  methylation  within  this  region  is  coupled  with 
binding  and  therefore  downstream  functional  consequences.  Consequently  the 
CpG  sites  facilitate  higher  variability  by  their  ability  to  subtly  modify  methyla- 
tion, perhaps  in  a  tissue-  or  developmental- specific  timeframe,  whilst  the 
non-CpG  possessing  allele  only  has  a  more  rigid  set  point  response  and  conse- 
quently reduced  variance. 

Therefore  the  genetic  variability  due  to  CpG  gain  or  loss  may  be  an  important 
facilitatory  factor  for  environmental  influence  and  this  role  may  be  predicted  from 
phased  haplotype  data  and  methylome  analysis  in  any  tissue.  The  strong  influence 
of  genetic  polymorphism  from  CpG- SNPs  on  the  methylome  has  been  identified,  as 
mentioned  (Shoemaker  et  al.  2010),  though  the  functional  relevance  will  be  wide- 
ranging  depending  upon  genomic  location.  The  tissue  specificity  of  the  epigenome 
may  then  be  acting  above  or  facilitated  by  this. 

Further  integration  with  chromatin  data  also  aids  the  functional  dissection  of 
these  genetic  regions.  The  co-ordination  of  histone  modification  code  data,  as  well 
as  binding  sites  of  the  insulator  protein  CTCF,  has  enabled  broad  scale  chromatin 
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Table  13.2  Annotation  of  the  Genome  from  chromatin  signatures  (ChromHMM  (Ernst  and  Kellis 
2012;  Ernst  et  al.  2011)) 


State 

CTCF 

H3K27me3 

H3K36me3 

H4K20me1 

H3K4me1 

H3K4me2 

H3K4me3 

H3K27ac 

H3K9ac 

Active  Promoter 



Weak  Promoter 





Inactive/Poised 
Promoter 

Strong  Enhancer 



Strong  Enhancer 

Weak/Poised 
Enhancer 

Weak/Poised 
Enhancer 

Insulator 

Transcriptional 
Transition 

Transcriptional 
Elongation 

Weak 

Transcribed 

Polycom  b 
Repressed 

Heterochromatin 
-  Low  signal 

Repetitive/CNV 

Repetitive/CNV 

Freq  % 

<10% 

11-20 

21-40 

41-60 

61-80 

81-90 

>90 

states  to  be  predicted  (ChromHMM)  (Ernst  and  Kellis  2012;  Ernst  et  al.  2011) 
(see  Table  13.2).  This  may  be  used  to  delineate  functional  relevance  of  methylome 
changes  within  these  chromatin  delineated  regions,  such  as  promoters,  transcribed 
gene  bodies  or  enhancers,  e.g.,  potential  influence  on  eRNA  (Zhou  et  al.  2011a; 
Kim  et  al.  2010).  The  integration  of  GWAS  haplotypes  with  chromatin  state  has 
also  begun  to  be  explored  within  nine  cell  types  (Ernst  et  al.  201 1)  and  available  in 
the  HaploReg  database  (Ward  and  Kellis  2012).  Subtle  methylation  variability 
influencing  TFBSs  may  be  highly  informative,  if  powerful-enough  studies  can  be 
performed  to  detect  this.  Encode  data  of  TFBSs  in  many  cell  types  are  now 
available  (Neph  et  al.  2012a,  b;  Spitz  and  Furlong  2012).  As  well  international 
consortium  projects,  IHEC  and  Roadmap,  have  increasing  sets  of  publically 
available  epigenome  datasets  available  now  and  to  come  in  the  near  future 
(Zhou  et  al.  2011b). 
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13.7    Methylome  Analysis 

DNA  methylome  biological  analysis  can  be  split  into  three  major  techniques: 
affinity  enrichment  by  use  of  antibodies,  such  as  methylated  DNA  immunoprecipi- 
tation  (MeDIP);  enzymatic  by  the  use  of  methylation- sensitive  restriction  enzymes; 
or  chemical  conversion,  usually  the  bisulphite  (BiS)  reaction  (Schones  and  Zhao 
2008).  All  three  can  be  used  either  to  investigate  an  individual  genetic  locus,  be 
coupled  with  arrays  for  high-throughput  analysis,  or  with  second-generation 
sequencing  for  genome- wide  high  resolution  methylome  analysis. 

The  most  common  examples  for  these  sequencing  based  combinations  include 
MeDIP- seq,  RRBS-seq  (Reduced  Representation  BiSulphite  sequencing)  and 
BiS-seq  (BiSulphite  sequencing).  Dependent  upon  what  biological  question  is  being 
asked  and  obvious  cost  implications,  the  various  methylome  analyses  have  different 
advantages  and  disadvantages  (Robinson  et  al.  2010).  MeDIP  obtains  good  coverage 
of  intermediate  methylation  regions,  such  as  CGI  shores,  per  read  mapped  though 
without  single  CpG  resolution.  RRBS  enables  single  CpG  resolution  but  data  is 
largely  only  available  in  the  CpG  dense  CGI  regions.  The  current  gold  standard  is 
whole  genome  shotgun  BiS-seq,  with  the  first  human  methylomes  published  using 
this  technique  in  2009  (Lister  et  al.  2009).  This  study  identified  a  considerable  level  of 
non-CpG  cytosine  methylation  in  embryonic  cells  that  was  lost  with  differentiation. 
The  cost  of  and  the  read  requirements  of  BiS-seq  are  still  extreme,  requiring  >109 
reads.  In  published  BiS-seq  studies  to  date  in  human,  average  alignable  genomic 
coverage  of  ~28x  (Lister  etal.  2009),24.7x  (Lietal.  2010),34.3x  (Zengetal.  2012) 
and  16  x  in  sperm  (Molaro  et  al.  2011)  was  reached.  Another  possible  consideration 
to  reduce  cost  is  whether  a  pooled  analysis  could  be  appropriate. 

The  computational  analysis  of  genome-wide  methylome  data  obviously  has  to 
be  highly  tailored  by  the  analysis  technique  used  to  generate  the  data,  with 
examples  including  bespoke  pipelines  for  MeDIP- seq  data,  MeDUSA  (Wilson 
et  al.  2012)  which  incorporates  MEDIPS  (Chavez  et  al.  2010),  and  other  detailed 
biological  and  bioinformatics  BiS-seq  analysis  protocols  (Johnson  et  al.  2012).  The 
numerous  computational  methods  have  been  recently  reviewed  excellently  by  Bock 
(2012).  Many  Bis-seq  protocols  to  date  have  not  adequately  dealt  with  the  issue  of 
SNPs,  particularly  the  significant  issue  of  C  >  T  SNP/unmethylated  cytosine 
ambiguity  (Liu  et  al.  2012).  Although  a  Bis-seq  SNP  caller  has  recently  been 
published  (Liu  et  al.  2012)  and  identified  correctly  96  %  of  SNPs  with  -30  x 
coverage. 

Whilst  the  identification  of  an  isolated  CpG  variation,  or  MVP,  in  a  genome- wide 
data  set  cannot  be  excluded  as  non-critical,  it  has  more  risk  of  false  positivity 
especially  with  the  large  number  of  CpGs  probed,  as  well  as  being  functionally 
more  difficult  to  delineate.  Therefore  regional  DMR  methods  using  this  individual 
cytosine  data  from  array  or  single  resolution  sequencing  have  been  developed,  such  as 
the  "bump  hunting"  methodology  (Jaffe  et  al.  2012)  to  look  for  statistically  outlying 
regions  of  methylation  and  the  CpG_MPs  analysis  tool  (Su  et  al.  2012)  which  uses  a 
hotspot  extension  algorithm  to  identify  unmethylated  and  methylated  regions. 
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13.8    Evolutionary  Aspects  in  the  Analysis 
of  the  Epigenome 

Human  uniqueness  in  both  our  phenotype  but  also  disease  susceptibility  is  the  result 
of  interaction  between  the  genome,  environment,  behaviour  and  culture  (Varki 
et  al.  2008).  Acquired  epigenetic  changes  are  highly  significant  in  the  evolution 
of  the  malignant  genome  (Esteller  2007),  and  these  forces  can  be  similar  over  the 
different  timeframes  of  cell,  population  and  species  evolution  (De  and  Babu  2010). 
Comparative  genomic  analysis  between  species  has  been  successful  in  identifying 
potential  functionally  significant  regions  due  to  long-standing  conservation 
(Lindblad-Toh  et  al.  2011).  However,  comparative  epigenomic  analysis  adds  addi- 
tional layers  of  complexity,  as  species-specific  genetic,  epigenetic  and  environmen- 
tal influences  may  be  driving  any  differences  found.  Therefore  the  epigenomic 
variation  identified,  within  the  same  tissue  between  highly  similar  species,  may  be  a 
mechanism  to  dissect  out  these  factors,  and  furthermore  delineate  the  loci  in  which 
this  variability  occurs.  In  regions  of  strong  sequence  similarity  there  is  a  higher 
probability  of  this  change  being  environmentally  driven,  although  potential  long 
range  cis-  or  trans- acting  genetic  differences  cannot  be  excluded. 

The  comparative  epigenomic  approach  to  date  has  included  vertebrate  ChlP-seq 
of  TFBSs  (Schmidt  et  al.  2010)  and  primate  methylome  analyses  (Molaro 
et  al.  2011;  Martin  et  al.  2011;  Pai  et  al.  2011).  Comparative  BiS-seq  of  frontal 
cortex  between  human  and  chimpanzee  showed  age-  and  sex-related  changes,  but 
also  that  differentially  methylated  genes  were  enriched  for  neurological,  psycho- 
logical and  cancer-related  disorders  (Zeng  et  al.  2012). 

With  the  identification  of  human-specific  DMRs,  a  subset  will  possess  signifi- 
cant environmentally  driven  differences.  More  subtle  potential  mfrYZ-human  devia- 
tion in  these  known  variable  regions  may  identify  environmental  effects  that  could 
be  tested  for  association  with  disease,  such  as  switched-up  or  -down  inflammatory 
or  developmental  pathways.  Interestingly,  human  species-specific  DMRs  in  a 
comparative  tri-primate  (human,  chimpanzee  and  macaque)  analysis,  by  MeDIP- 
seq  in  peripheral  blood,  were  found  to  be  more  prevalent  in  CGI  shores  than  the 
islands  themselves  (Wilson  et  al.  in  preparation),  in  a  similar  fashion  to  cancer  and 
tissue  DMRs  (Irizarry  et  al.  2009). 

Genetic  change  in  CpG  presence  may  influence  CpG  island  density,  or  lead  to 
the  erosion  of  or  accretion  within  shore  regions,  thus  influencing  the  location  of 
these  highlighted  regions  of  potential  variability.  As  discussed  the  necessity  of  this 
dinucleotide  template  to  facilitate  methylation,  enable  or  decouple  methylation 
from  a  functional  role,  mean  isolated  CpG  within  TFBSs,  etc.  can  still  be  important. 
By  genomic  analysis  of  six  primates,  the  subset  of  non-polymorphic  human- specific 
CpGs  (—1.19  million),  termed  "CpG  Beacons",  were  identified  (Bell  et  al.  2012). 
Significant  clusters  were  found  to  be  enriched  for  neurological  and  inflammatory 
disease-associated  loci.  It  may  be  that  these  CpGs  are  enriched  for  epigenetic 
modifications  in  human-specific  disease  changes.  The  reduced  statistical  correction 
required  with  this  limited  set  may  enable  the  identification  of  novel  associations  and 
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could  be  used  to  test  for  potential  increased  or  reduced  human- specific  regulatory 
effect  in  certain  disease  traits. 

The  dramatic  rise  in  allergic  disease  indicates  that  immune  pathways  appear 
remarkability  susceptible  to  modern  environmental  influences  (Martino 
et  al.  2011),  that  epigenetic  insights  into  this  rapid  rise  may  be  identified  (Martino 
and  Prescott  2012)  and  these  may  be  human- specific.  Human- specific  CpG  and 
DMR  loci  or  similar  approaches  can  be  used  to  look  for  evidence  for  hypotheses 
such  as  this,  as  well  as  others  such  as  early  developmental  effects  on  adult  disease 
(Gluckman  et  al.  2008),  inflammatory  roles  in  obesity  (Greenfield  et  al.  2004)  or 
atherosclerosis  (Varki  et  al.  2011;  Libby  et  al.  2011). 


13.9    Future  Developments 

Validation  of  the  large  number  of  potential  disease-related  methylation  regions  that 
may  arise,  particularly  from  a  first  stage  array  design,  can  be  a  practically  difficult 
task.  Now  with  the  availability  of  non-biased  targeted  amplification  techniques  that 
can  be  used  on  BiS-converted  DNA,  such  as  Raindance  (Tewhey  et  al.  2009),  and 
subsequent  coupling  with  second-generation  sequencing,  this  will  enable  deep 
base-resolution  targeted  confirmation  of  these  loci. 

Once  confirmed  DMRs  can  be  identified,  functional  assessment  of  these  would 
previously  be  by  the  use  of  global  epigenomic  effectors,  like  demethylating  agents, 
such  as  5-azacytidine,  in  cell-line  models.  Precisely  targeted  assessment  by  meth- 
ylation manipulation  with  artificial  transcription  factors  (Rivenbark  et  al.  2012)  or 
transcription  activator-like  effector  nucleases  TALENs  (Reyon  et  al.  2012)  are  now 
being  proposed  and  will  permit  more  exact  understanding  of  epigenetic  change  in 
these  loci. 

Base-resolution  analysis  of  further  modifications  of  DNA,  for  example 
hydroxymethylation  (5hmC),  has  recently  been  shown  to  be  possible  with  the 
chemical  conversion  technique,  detailed  by  Booth  et  al.,  of  selective  chemical 
oxidation  of  5hmC  to  5fC  with  subsequent  bisulphite  conversion  to  uracil  and 
then  amplification  to  thymine  (Booth  et  al.  2012).  This  pseudo-SNP  generation 
then  would  be  amenable  to  high-throughput  array  technology  analysis,  in  the  same 
fashion  as  for  DNA  methylation.  Recently  the  possible  importance  of  this  5hmC 
mark  has  been  highlighted  by  an  identified  reduction  across  melanoma  genomes 
(Lian  et  al.  2012),  as  well  as  increased  levels  within  synaptic  genes,  exon-intron 
boundaries,  and  constitutively  spliced  exons  within  brain  tissue,  indicating  a  poten- 
tial role  in  splicing  in  the  central  nervous  system  (Khare  et  al.  2012). 

As  mentioned  the  prospect  of  third-generation  sequencing  would  be  revolution- 
ary for  the  field  of  epigenetics  by  directly  enabling  the  assessment  of  DNA 
modifications  without  the  necessity  for  DNA  degradative  chemical  modifications 
(Flusberg  et  al.  2010;  Clarke  et  al.  2009).  Accurate  detection  of  the  low  frequency 
modifications  (5hmC,  5fC,  5caC  and  possibly  other  undiscovered)  would  allow 
these  to  be  properly  explored  in  more  depth. 
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The  clinical  applications  of  DNA  methylation  profiling  as  a  signature  of  human 
disease  have  particular  promise  (Heyn  and  Esteller  2012).  As  well  as  an  additional 
dimension  to  integrate  into  genomic  and  transcriptomic  data  for  precise  pathologi- 
cal analysis  (Bell  and  Beck  2009),  further  possibilities  include  minute  detection  of 
non-invasive  cancer  or  signatures  of  inflammatory  signals  from  surrogate  biological 
fluids  (Heyn  and  Esteller  2012).  The  high  level  of  genetic  mutation  in  epigenetic 
regulators  and  chromatin  remodelling  genes,  in  both  cancer  (Dawson  et  al.  2012) 
and  developmental  disorders  (Kleefstra  et  al.  2006;  Bienvenu  and  Chelly  2006), 
signal  the  importance  of  the  accurate  control  of  the  epigenomic  dimension  in  these 
complex  disorders  and  that  further  dysregulation  will  be  found.  Finally,  in  the 
future  with  increasingly  detailed  base-resolution  understanding  of  the  epigenome, 
the  prospect  of  "interventional  epigenomic  s"  may  become  possible.  Due  to  the 
epigenome 's  dynamic  plasticity  it  may  make  more  amenable  to  targeted  therapeutic 
manipulation  than  static  somatic  mutation  (Dawson  and  Kouzarides  2012). 
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Chapter  14 

Analytical  Considerations 

for  Epigenome-Wide  Association 

Scans  of  Complex  Traits 

Jordana  T.  Bell 


Abstract  Recent  advances  in  molecular  methods  and  next-generation  sequencing 
technologies  have  allowed  for  the  detection  of  epigenetic  variation  at  an  unprece- 
dented level  of  resolution.  With  the  availability  of  new  genome-wide  epigenetic 
assays,  particularly  for  DNA  methylation,  epigenetic  studies  of  human  complex 
traits  have  moved  towards  epigenome-wide  association  scans  (EWAS).  Similar  to 
genome- wide  association  scans  (GWAS),  EWAS  aim  to  perform  a  genome- wide 
search  for  epigenetic  variants  that  associate  with  complex  phenotypes  and  have 
potential  to  identify  novel  genes  and  molecular  pathways  in  common  disease. 
However,  unlike  genetic  variation,  epigenetic  variation  can  be  dynamic,  which 
has  implications  for  EWAS  methodology  and  design.  This  chapter  discusses  the 
analytical  aspects  of  performing  EWAS  of  DNA  methylation  changes  in  complex 
traits,  as  well  as  the  potential  to  integrate  genetic  and  epigenetic  variation  in  the 
analysis  of  molecular  mechanisms  underlying  complex  phenotypes. 

14.1  Introduction 

Epigenome-wide  association  scans  (EWAS)  are  large-scale  studies  of  epigenetic 
changes  in  complex  phenotypes.  EWAS  aim  to  systematically  assess  epigenetic 
variation  throughout  the  genome,  and  test  for  association  between  epigenetic  levels 
and  complex  traits.  For  example,  a  recent  EWAS  of  rheumatoid  arthritis 
characterized  DNA  methylation  levels  genome-wide  and  observed  significantly 
lower  DNA  methylation  levels  at  the  major  histocompatibility  (MHC)  region  in 
rheumatoid  arthritis  cases,  compared  to  controls  (Liu  et  al.  2013).  In  many  respects 
EWAS  take  a  similar  approach  to  genome- wide  association  studies  (GWAS); 
however,  they  require  further  considerations  in  methodology,  design,  and 
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interpretation  due  to  the  dynamic  nature  of  epigenetic  levels  over  time.  At  present, 
the  majority  of  EWAS  focus  on  DNA  methylation,  potentially  because  of  the 
availability  of  genome-wide  technologies  to  assay  DNA  methylation  marks  com- 
pared to  assays  available  for  other  epigenetic  mechanisms.  With  the  advancement 
of  molecular  technologies  EWAS  will  likely  soon  include  additional  epigenetic 
mechanisms,  such  as  histone  modifications  and  chromatin  structure  assays. 
Recently,  EWAS  of  DNA  methylation  levels  in  complex  disease  have  identified 
disease-associated  differentially  methylated  regions  (DMRs)  for  multiple  traits 
(Bakulski  et  al.  2012;  Bell  et  al.  2012;  Breitling  et  al.  2011;  Cheung  et  al.  2010; 
Gervin  et  al.  2012;  Hasler  et  al.  2012;  Javierre  et  al.  2010;  Kaminsky  et  al.  2008; 
Mastroeni  et  al.  2010;  Rakyan  et  al.  2010,  2011a;  Teschendorff  et  al.  2010; 
Toperoff  et  al.  2012;  Zhao  et  al.  2012),  where  modest  changes  in  DNA  methylation 
levels  were  associated  with  disease  or  quantitative  trait  levels.  Furthermore,  it  has 
also  been  suggested  that  epigenetic  variants  may  not  only  influence  complex  trait 
levels  but  may  also  contribute  to  phenotype  variability  (Feinberg  and  Irizarry  2010), 
both  directly  or  through  genetic-epigenetic  interactions  (Yang  et  al.  2012).  This 
idea  broadens  the  scope  of  potential  EWAS  methods  towards  integrative 
frameworks  that  can  incorporate  trait  variance  effects,  thus  moving  beyond  stan- 
dard epigenetic-phenotype  tests  of  association.  The  ultimate  goal  of  EWAS  is  to 
obtain  a  mechanistic  insight  into  the  biological  pathways  involved  in  phenotype 
susceptibility  and  progression. 


14.2    DNA  Methylation 

To  date,  the  majority  of  epigenetic  studies  have  focused  on  DNA  methylation, 
which  has  unique  features  that  may  influence  EWAS  study  design.  DNA  methyla- 
tion is  one  of  the  most  common  and  stable  epigenetic  mechanisms.  DNA  methyla- 
tion can  have  downstream  effects  on  gene  expression  and  consequently  play  an 
important  role  in  normal  development  and  disease.  Within  the  context  of  EWAS,  a 
distinction  should  be  made  between  the  level  of  DNA  methylation  at  the  individual 
cell  level,  and  the  level  of  DNA  methylation  within  a  tissue  sample  from  an 
individual  subject,  which  consists  of  a  population  of  cells  (Fig.  14.1).  It  is  typically 
DNA  methylation  levels  within  a  sample  from  an  individual  that  represent  the 
sampling  unit  for  the  majority  of  EWAS  approaches.  DNA  methylation  levels  and 
changes  in  DNA  methylation  levels  can  be  heritable,  both  through  mitosis  and 
meiosis.  However,  methylation  can  also  change  over  time  and  has  been  linked  to 
environmental  factors,  such  as  smoking  (Breitling  et  al.  201 1).  DNA  methylation  in 
mammals  occurs  on  the  cytosine  base,  predominantly  but  not  exclusively  in  the 
context  of  cytosine-phosphate-guanine  (CpG)  dinucleotides  (Ramsahoye 
et  al.  2000).  CpG  dinucleotides  tend  to  cluster  in  CpG  islands  (CGIs),  which  are 
regions  of  high  CpG  density.  In  general,  DNA  methylation  of  promoters  is  nega- 
tively associated  with  gene  expression  levels  across  the  genome  in  cancer  (see 
Jones  and  Baylin  2002)  and  healthy  tissues  (Eckhardt  et  al.  2006;  Song  et  al.  2005), 
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Fig.  14.1  An  illustrative  example  of  DNA  methylation  levels  in  the  context  of  an  epigenetic 
model  of  disease  susceptibility.  The  left  panel  represents  DNA  methylation  levels  within  a  single 
cell.  The  middle  panel  represents  tissue  samples  from  subjects,  where  the  overall  sample  DNA 
methylation  level  is  expressed  as  p,  or  the  allele  frequency  of  the  methylated  allele  in  the 
population  of  cells  within  the  tissue  sample.  In  this  hypothetical  example  the  overall  DNA 
methylation  level  in  the  tissue  sample  contributes  to  disease  status  in  a  case-control  sample. 
The  right  panel  represents  the  distribution  of  locus- specific  DNA  methylation  levels  in  a  sample  of 
individuals  for  un-methylated,  hemi-methylated,  and  methylated  loci,  which  is  representative  of 
that  observed  on  the  Illumina  InfiniumMethylation  microarrays 


while  gene-body  methylation  has  been  positively  correlated  with  expression  (Aran 
et  al.  201 1;  Jjingo  et  al.  2012).  DNA  methylation  is  also  known  to  be  not  only  tissue 
specific  but  also  cell  type  specific  (Rakyan  et  al.  2008;  Thompson  et  al.  2010; 
Heijmans  et  al.  2009;  Ollikainen  et  al.  2010),  which  is  one  of  the  key  features  to 
consider  in  EWAS  design  choices.  DNA  methylation  is  strongly  correlated  with 
other  epigenetic  marks  such  as  histone  modification  (Cedar  and  Bergman  2009)  and 
other  markers  of  open  and  closed  chromatin  structure  (Dunham  et  al.  2012).  These 
findings  suggest  that  shared  mechanisms  of  epigenetic  regulation  exist  (Cedar  and 
Bergman  2009;  Fuks  2005;  Zhou  et  al.  2010),  and  it  has  been  proposed  that 
transcriptionally  silent  chromatin  may  be  a  mark  of  de  novo  DNA  methylation 
(Bird  2002),  implying  that  at  least  in  some  cases  DNA  methylation  may  be  the  mark 
rather  than  trigger  of  chromatin- state  change. 
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14.2.1    Genetic  and  Environmental  Effects  on  DNA 
Methylation 

DNA  methylation  at  specific  sites  can  be  affected  by  genetic  (Bell  et  al.  2012), 
environmental  (Christensen  et  al.  2009;  Gronniger  et  al.  2010),  and  stochastic 
variation  (Bjornsson  et  al.  2008;  Fraga  et  al.  2005).  DNA  methylation  heritability 
has  been  shown  in  twin  studies,  where  monozygotic  (MZ)  twins  have  more  similar 
DNA  methylation  patterns  than  dizygotic  (DZ)  twins  (Kaminsky  et  al.  2009),  and 
from  observations  that  methylation  patterns  segregate  in  families  (Bjornsson 
et  al.  2008).  Recently,  many  genetic  variants  have  been  identified  as  methylation 
quantitative  trait  loci  (me-QTLs),  which  are  genetic  polymorphisms  that  signifi- 
cantly associate  with  DNA  methylation  levels,  predominantly  in  cis  (Bell 
et  al.  2011;  Gibbs  et  al.  2010;  Gamazon  et  al.  2012;  Numata  et  al.  2012; 
Gertz  et  al.  2011;  Schilling  et  al.  2009).  DNA  methylation  levels  have  also  been 
reported  to  be  associated  with  environmental  variants  within  individuals,  such  as 
smoking  (Breitling  et  al.  2011).  Environmental  exposures  in  the  parental  genera- 
tion, such  as  maternal  smoking  during  pregnancy  (Joubert  et  al.  2012)  and  parental 
diet  (Heijmans  et  al.  2008;  Vucetic  et  al.  2010),  have  also  been  associated  with  an 
individual's  DNA  methylation  level  at  particular  loci.  Therefore,  DNA  methylation 
levels  may  change  in  response  to  environmental  variation  over  time  both  within 
individuals  and  within  families  across  generations  (Wong  et  al.  2010b;  Fraga 
et  al.  2005;  Bjornsson  et  al.  2008). 


14.2.2    DNA  Methylation  Assays 


Many  approaches  can  be  used  to  profile  DNA  methylation  levels  and  a  recent 
review  by  Laird  (2010)  gives  an  in-depth  overview  of  their  key  features  and 
potential  sources  of  bias.  Briefly,  commonly  used  platforms  in  EWAS  can  be 
broadly  divided  into  three  groups  (Rakyan  et  al.  2011b;  Bell  and  Spector  2011; 
Laird  2010):  (1)  microarrays,  (2)  enrichment-based  platforms,  and  (3)  bisulfite- 
sequencing  approaches.  At  present,  the  most  commonly  used  microarray  is  the 
Illumina  Infinium  HumanMethylation450  (Illumina  450k)  (Dedeurwaerder 
et  al.  2011),  which  is  the  new  version  of  the  Illumina  Infinium 
HumanMethylation27  (Illumina  27k)  (Bibikova  et  al.  2009)  and  the  Illumina 
GoldenGate  Methylation  Cancer  Panel  I  (Illumina  GoldenGate)  arrays,  and  is 

based  on  genotyping  of  bisulfite-converted  DNA.  The  assay  profiles  DNA  methyl- 

i-i 

ation  at  -485,000  CpG-sites  predominantly  located  near  genes,  out  of  ~10  possible 
CpG-sites  across  the  genome.  Unlike  the  Illumina  27k,  the  Illumina  450k  does  not 
target  predominantly  CpG-sites  in  CGIs,  but  also  examines  CGI  shores,  shelves, 
and  regions  of  the  genome  of  lower  CpG  density.  In  terms  of  potential  sources  of 
bias,  as  this  approach  is  based  on  bisulfite-conversion,  incomplete  bisulfite- 
conversion  and  bisulfite-PCR  can  both  be  sources  of  bias,  as  well  as  the  inability 
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to  distinguish  between  different  types  of  cytosine  methylation  (Huang  et  al.  2010), 
such  as  5-methyl-cytosine  (5-mC)  and  5-hydroxy-methyl-cytosine  (5-hmC). 
Despite  incomplete  genome  coverage,  the  Illumina450k  can  be  a  cost-effective 
method  to  characterize  DNA  methylation  patterns,  and  can  allow  standardized 
comparisons  and  meta-analysis  across  EWAS  on  a  common  platform  (Rakyan 
et  al.  2011b). 

Enrichment-based  platforms  capture  and  sequence  the  methylated  parts  of  the 
genome.  Commonly  used  platforms  include  methylated  DNA  immunoprecipitation 
sequencing  (MEDIP-seq),  methylated  DNA  capture  by  affinity  purification 
sequencing  (MECAP-seq),  and  methylated  DNA-binding  domain  sequencing 
(MBD-seq).  MEDIP-seq  uses  an  antibody  specific  for  5-mC  to  extract  methylated 
fragments  from  sonicated  DNA  (Weber  et  al.  2005,  2007),  while  MECAP-seq 
(Brinkman  et  al.  2010)  and  MBD-seq  (Weber  et  al.  2005;  Zhang  et  al.  2006)  use 
methyl-binding  domain  proteins  to  obtain  methylated  DNA.  Affinity-based  enrich- 
ment methods  may  exhibit  a  CpG  density-dependent  bias  in  binding  affinity  (Down 
et  al.  2008)  and  a  potential  bias  due  to  GC  content.  It  is  also  difficult  to  obtain  a 
measure  of  the  overall  level  of  DNA  methylation  in  the  sample.  Importantly,  these 
approaches  do  not  provide  single-CpG-level  resolution  of  methylation,  but  are 
restricted  by  the  size  of  DNA  fragment  (usually  ~200  bp).  On  the  other  hand,  the 
methods  are  rapid  and  efficient,  provide  good  genome-wide  coverage,  and  are  able 
to  measure  allele- specific  DNA  methylation  levels.  Furthermore,  specific 
enrichment-based  approaches  (for  example,  MeDIP-seq)  can  be  designed  to  target 
specifically  5-mC  or  5-hmC,  and  previous  studies  have  used  these  approaches  to 
examine  non-CpG  methylation  levels  (Bock  et  al.  2010;  Li  et  al.  2010). 

Bisulfite-sequencing  methods  include  whole-genome  bisulfite  sequencing 
(WGBS)  (Cokus  et  al.  2008;  Lister  and  Ecker  2009)  and  reduced  representation 
bisulfite  sequencing  (RRBS)  (Gu  et  al.  2010).  WGBS  is  currently  the  gold  standard 
for  assaying  DNA  methylation,  because  it  can  provide  absolute  levels  of  DNA 
methylation  at  single-CpG-site  resolution  at  good  genome-wide  coverage,  as  well 
as  an  overall  genome-wide  level  of  DNA  methylation  for  the  sample,  and  it  can  also 
measure  allele-specific  methylation  and  DNA  methylation  levels  at  non-CpG-sites. 
However,  this  approach  is  also  not  ideal  because  certain  regions  of  the  genome  are 
difficult  to  bisulfite  sequence,  either  during  the  protocol  or  due  to  the  alignment  of 
bisulfite-converted  reads.  Potential  sources  of  bias  here  include  incomplete 
bisulfite-conversion  and  bisulfite-PCR  bias,  as  well  as  the  inability  to  distinguish 
between  5-mC  and  5-hmC  (Huang  et  al.  2010). 

Sample  requirements  for  DNA  quality  and  quantity  may  impact  the  choice  of 
assay  (Laird  2010).  For  example,  microarrays  require  less  material  (for  example 
500  ng  for  Illumina450k)  (Bibikova  et  al.  2009;  Dedeurwaerder  et  al.  2011)  com- 
pared to  enrichment-based  or  enzyme-based  platforms  (Laird  2010).  The  methyla- 
tion profiling  assays  described  here  have  been  applied  in  multiple  studies,  but  not  all 
strategies  have  genome-wide  coverage  and  different  technical  and  biological 
factors  (for  example,  CpG  density)  may  bias  the  estimated  DNA  methylation  levels. 
Therefore,  it  is  important  to  validate  DNA  methylation  levels  at  key  regions  of 
interest  using  multiple  assays. 
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14.3    Design  and  Analysis  of  EWAS 

EWAS  study  design  and  analysis  require  consideration  due  to  the  unique  features  of 
DNA  methylation,  which  can  be  tissue  specific,  variable  over  time,  and  may  both 
contribute  to  and  result  from  phenotypic  changes.  Furthermore,  assay  sensitivity, 
coverage,  and  study  design  will  also  impact  power  to  detect  disease-associated 
DMRs  (Fig.  14.2). 


14.3.1    Tissue  Specificity 

DNA  methylation  variants  can  be  tissue  specific  or  shared  across  tissues  (Gibbs 
et  al.  2010;  Gamazon  et  al.  2012;  Numata  et  al.  2012).  It  is  important  to  identify  and 
sample  the  tissue  that  is  most  relevant  for  the  trait.  However,  often  the  most 
appropriate  tissue  may  not  be  available  or  easily  accessible,  and  for  many  studies 
only  whole-blood  DNA  will  be  available.  The  suitability  of  whole  blood  as  a 
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Fig.  14.2  Methodological  considerations  for  EWAS  study  designs.  The  top  panel  lists  the  major 
factors  that  may  influence  DNA  methylation  levels.  The  middle  panel  represents  a  list  of  factors 
that  may  impact  the  measurement  of  DNA  methylation  levels.  The  bottom  panel  represents 
additional  factors  that  may  play  a  role  in  identifying  and  interpreting  the  observed  associations 
between  DNA  methylation  levels  and  disease.  Three  common  types  of  EWAS  study  designs  are 
incorporated  within  this  example 
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surrogate  for  DNA  methylation  in  other  tissues  is  extensively  debated  (Heijmans 
and  Mill  2012;  Rakyan  et  al.  2011b;  Tsai  et  al.  2012).  At  CpG-sites  where  DNA 
methylation  is  heritable  or  has  been  linked  to  genetic  variants  or  me-QTLs, 
methylation  is  more  stable  over  time  and  conserved  across  tissues.  However, 
CpG-sites  with  me-QTLs  constitute  a  small  proportion  of  sites  genome-wide.  Due 
to  the  availability  of  whole-blood  samples,  whole-blood  DNA  methylation  is  often 
used  for  first-stage  EWAS  analyses,  and  subsequent  follow-up  aims  to  assess 
methylation  tissue  specificity  at  the  disease  DMRs  in  multiple  tissues  of  relevance 
to  disease.  To  address  the  concern  of  cell  heterogeneity  in  whole-blood  DNA 
methylation  EWAS,  where  possible  whole-blood  DNA  methylation  levels 
should  be  corrected  for  blood  cell  subtype  proportions  if  data  are  available 
(Adalsteinsson  et  al.  2012;  Houseman  et  al.  2012).  Additional  EWAS  designs 
with  consideration  of  tissue  specificity  apply  to  cancer  studies,  where  one  approach 
would  be  to  compare  DNA  methylation  levels  from  healthy  and  affected  tissue  from 
the  same  individual,  as  well  as  include  healthy  tissue  from  unaffected  individuals  as 
another  control  sample. 

14.3.2    Bias  and  Covariates  in  Large-Scale  DNA 
Methylation  Datasets 

All  DNA  methylation  assays  may  be  affected  by  technical  sources  of  noise  and  bias. 
For  example,  batch  effects  have  been  described  in  both  microarray  (Bell  et  al.  201 1) 
and  sequencing  studies  (Laird  2010).  For  Illumina  DNA  methylation  microarray s, 
samples  processed  during  the  same  bisulfite  conversion  step,  samples  assayed  on 
the  same  Illumina  chip,  or  samples  located  on  the  same  position  across  chips  may 
exhibit  more  similar  profiles  of  DNA  methylation,  resulting  in  batch  effects. 
Furthermore,  known  covariates  such  as  age,  cell  subtype  sample  heterogeneity, 
and  environmental  factors  such  as  smoking  and  diet  will  also  impact  DNA  methyl- 
ation levels  at  specific  loci.  EWAS  should  be  designed  to  minimize  these  effects, 
that  is,  to  exclude  the  possibility  of  these  technical  and  biological  covariates 
becoming  confounders  for  the  disease  of  interest.  One  standard  approach  towards 
this  goal  should  be  to  randomly  allocate  samples  across  potential  batches,  and  to 
perform  analyses  that  adjust  for  measured  and  unmeasured  covariates.  Initial  data 
checks  should  examine  the  distribution  of  methylation  signals  across  potential 
batches.  Correlation  patterns  in  genome- wide  DNA  methylation  levels  should  be 
assessed  across  the  sample  of  individuals  and  within  cases  and  controls  separately, 
as  well  as  across  genome-wide  loci  and  within  autosomes  and  sex  chromosomes 
separately.  Several  methods  to  assess  heterogeneity  in  high-throughput 
technologies  should  be  applied,  for  example,  previous  studies  have  used  surrogate 
variable  analysis  in  gene  expression  (Leek  and  Storey  2007),  or  principal  compo- 
nent analysis  and  other  approaches  (see  Leek  et  al.  2010).  Data  analysis  post  data 
normalization  should  be  applied,  but  at  the  key  regions  of  interest,  analysis  using 
unprocessed  signals  should  also  be  performed  (Laird  2010).  Computational 
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approaches  to  control  for  confounders  and  technical  artifacts  can  increase  power  to 
detect  biological  effects.  For  example,  accounting  for  measured  and  unmeasured 
sources  of  heterogeneity  in  gene  expression  data,  using  surrogate  variable  analyses, 
increased  the  power  to  detect  disease-associated  gene  expression  profiles  in  large- 
scale  samples  (Leek  and  Storey  2007).  The  application  of  these  methods  to  high- 
throughput  DNA  methylation  datasets  can  reduce  spurious  signals  and  incorrect 
conclusions,  and  increase  the  reproducibility  of  the  results. 


14.3.3    EWAS  Study  Designs 

Several  EWAS  study  designs  have  been  proposed  (Rakyan  et  al.  2011b;  Tsai 
et  al.  2012).  Similar  to  GWAS,  cross-sectional  population-  and  family-based 
EWAS  designs  can  be  adopted  and  many  such  cohorts  are  already  available  from 
GWAS  efforts.  In  a  population-based  EWAS  such  as  a  case-control  study,  cases 
and  controls  are  selected  retrospectively  and  DNA  methylation  patterns  are  com- 
pared to  detect  disease-related  DMRs.  If  the  aim  of  EWAS  is  to  identify  extreme 
methylation  changes,  DNA  methylation  comparison  of  pooled  cases  and  controls 
may  also  be  pursued  (Docherty  et  al.  2009).  As  in  GWAS,  sample  heterogeneity 
and  population  stratification  are  also  likely  to  impact  population-based  EWAS 
findings,  and  statistical  measures  to  assess  and  control  for  these  effects  should  be 
taken  into  account.  For  example,  at  a  small  proportion  of  CpG-sites  across  the 
genome  DNA  methylation  levels  are  heritable  and  may  be  subject  to  population 
stratification  effects.  Aside  from  stratification,  it  is  also  possible  that  EWAS  in 
cases  and  controls  will  detect  DMRs  at  heritable  methylation  sites.  Therefore, 
cross-sectional  population-based  EWAS  may  detect  genetic  associations  with  the 
phenotype  that  are  mediated  through  DNA  methylation. 

In  cross-sectional  family-based  designs,  the  disease-discordant  identical  twin 
design  is  often  used  in  epigenetic  studies  of  disease  (Bell  and  Saffery  2012).  MZ 
twins  share  nearly  all  genetic  variants  and  many  environmental  and  lifestyle 
factors,  but  have  different  epigenomes.  Disease-discordant  MZ  twins  have  been 
used  to  assess  the  contribution  of  environmental,  lifestyle,  and  epigenetic  risk 
factors  to  many  complex  traits  and  phenotypes  (see  Bell  and  Spector  2011).  The 
aim  of  twin-based  EWAS  is  to  clarify  potential  nongenetic  epigenetic  changes 
present  in  the  case,  but  not  control,  twin.  Other  family-based  study  designs  include 
comparisons  across  sib  and  half-sib  pairs,  and  across  parent-offspring  pairs,  trios, 
and  multigenerational  families,  which  may  reveal  the  extent  of  genetic  regulation  of 
the  epigenetic  mark  at  a  particular  locus  within  and  across  generations.  However,  it 
is  often  difficult  to  obtain  sufficient  family-based  samples  for  EWAS,  particularly 
for  rare  diseases,  for  example,  only  1  in  250  individuals  has  an  identical  twin 
(Bulmer  1970). 

The  main  disadvantage  of  cross-sectional  population-  and  family-based  EWAS 
is  that  the  timing  of  the  methylation  modification  with  respect  to  disease  onset 
cannot  be  established,  because  methylation  can  be  both  causal  and  also  a 
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consequence  of  disease.  The  only  EWAS  study  design  that  can  help  address  this  in 
disease  susceptibility  is  the  longitudinal  design,  where  DNA  methylation  levels  at  a 
locus  of  interest  are  assessed  prior  to  and  after  the  development  of  disease.  These 
studies  aim  to  identify  DNA  methylation  changes  that  occur  prior  to  disease  onset 
and  are  therefore  potentially  causal  to  disease,  and  distinguish  these  from  methyla- 
tion variants  that  occur  only  after  disease  onset  and  are  therefore  likely 
consequences  of  disease  progression.  In  addition,  they  allow  for  the  assessment 
of  the  temporal  variability  in  DNA  methylation  itself,  which  is  also  of  biological 
interest.  However,  longitudinal  samples  are  difficult  to  obtain.  Furthermore,  the 
optimal  time  of  sampling  with  respect  to  disease  onset  is  unknown,  for  example,  is  a 
10-year  window  prior  to  and  post  disease  onset  the  most  optimal  timescale?  The 
answer  is  likely  to  depend  on  the  age  at  disease  onset,  because  there  is  evidence  that 
rates  of  age-related  changes  in  DNA  methylation  are  higher  in  younger  compared  to 
older  individuals  (Talens  et  al.  2010;  Tsai  et  al.  2012;  Wong  et  al.  2010a).  However, 
in  practice  nearly  all  longitudinal  studies  will  be  limited  by  the  availability  of 
longitudinal  cohort  samples. 

Combining  multiple  types  of  EWAS  study  designs  is  perhaps  the  most  useful 
approach  to  epigenetic  studies  of  disease.  For  example,  discovery-stage  EWAS 
may  start  with  a  large  case-control  sample.  If  disease  DMRs  are  identified,  the 
second  stage  of  analyses  may  address  the  potential  causes  of  the  DMRs,  that  is, 
whether  the  DMRs  have  a  genetic  or  a  nongenetic  origin.  This  follow-up  may 
include  integrated  genetic-epigenetic  phenotype  analysis  in  the  original 
case-control  sample,  and  disease-discordant  twin  analysis  of  the  peak  DMRs. 
The  third  stage  would  be  longitudinal  analyses  to  establish  whether  these  DMRs 
are  temporally  stable,  and  potentially  causal  or  consequences  of  disease  (Fig.  14.2). 


143.4   EWAS  Power 

Power  to  detect  DMRs  in  EWAS  depends  on  many  factors  including  study  design, 
sample  size,  DMR  effect,  interindividual  variability  in  DNA  methylation  levels, 
methylation  assay  coverage  and  sensitivity,  and  stability  of  the  DNA  methylation 
variant  over  time.  Relatively  few  studies  have  assessed  EWAS  power  to  date.  In 
twins,  EWAS  published  to  date  have  used  either  small  samples  (n  <  5)  with  high- 
resolution  approaches  such  as  bisulfite  sequencing  (Baranzini  et  al.  2010)  or  lower 
resolution  assays  such  as  Illumina450K  or  Illumina27K,  with  modest  sample  sizes 
(n  =  10-25)  (Bell  et  al.  2012;  Dempster  et  al.  2010;  Gervin  et  al.  2012;  Hasler 
et  al.  2012;  Rakyan  et  al.  2011a).  Several  studies  have  estimated  locus-specific 
power  estimates  for  the  disease-discordant  twin  design.  Kaminsky  et  al.  (2008) 
estimated  power  for  a  particular  locus  in  a  genome-wide  methylation  assay 
targeting  CpG  island  regulatory  elements.  The  results  suggested  that  reasonable 
power  to  detect  moderate  effects  could  be  obtained  with  a  small  sample  of  disease- 
discordant  twins.  For  example,  greater  than  80  %  power  to  detect  a  1.15-fold 
change  in  the  methylation  signal  was  obtained  with  a  sample  size  of  21  twin 
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pairs.  However,  the  majority  of  currently  used  methylation  assays  assess  single 
CpG- sites.  Formal  power  calculations  for  more  extensive  genome- wide  coverage  at 
single  CpG- site  resolution  have  not  yet  been  reported  in  twins.  Preliminary 
estimates  report  low  (35  %)  to  reasonable  (>80  %)  power  to  detect  DMRs  at 
specific  CpG-sites  at  methylation  differences  of  5-6  %  between  affected  and 
unaffected  twins  in  20-22  disease-discordant  twin  pairs  (Bell  et  al.  2012;  Dempster 
et  al.  2010). 

In  case-control  EWAS,  two  recent  studies  have  explored  power  to  detect  DMRs. 
Wang  (201 1)  compared  the  performance  of  a  novel  approach  to  standard  parametric 
and  nonparametric  tests  in  EWAS,  and  suggested  that  greater  power  can  be 
achieved  if  not  only  mean  differences  but  also  the  variance  of  methylation  levels 
are  taken  into  account.  The  study  observed  that  86.6  %  power  was  achieved  to 
detect  a  DMR  with  a  mean  case-control  difference  of  9  %  in  methylation  levels, 
using  250  cases  and  250  controls.  Rakyan  et  al.  (2011b)  also  found  that  taking  into 
account  not  only  the  difference  but  also  the  variance  in  methylation  levels  will 
impact  power  results.  The  authors  estimated  that  88  %  power  can  be  obtained  to 
detect  a  DMR  with  methylation  odds  ratios  (methOR)  of  1 .25  and  sample  size  of 

 ST 

800  cases  and  800  controls  at  a  significance  level  of  10  if  the  locus  is  primarily 
methylated,  but  power  is  reduced  (<10  %)  with  greater  variability  in  methylation 
levels.  The  methOR  is  defined  as  the  odds  for  a  strand  of  DNA  from  a  case  to  be 
methylated  divided  by  the  same  odds  for  controls.  Power  estimates  of  80  %  could 
be  achieved  with  a  high  methOR  of  2.1 1  in  at  least  200  cases  and  200  controls,  but 
for  a  more  realistic  methOR  of  1.49,  increased  sample  sizes  of  at  least  800  cases  and 
800  controls  were  required  (Rakyan  et  al.  2011b). 

14.3.5  EWAS  Significance  Thresholds 

EWAS  significance  threshold  levels  should  account  for  multiple  testing.  One 
approach  is  to  consider  the  total  number  of  CpG-sites  examined.  For  example  the 
Illumina  450k  array  includes  -485,000  CpG-sites  across  the  genome,  resulting  in  a 
Bonferroni-adjusted  significance  threshold  of  P  ~  10  .  However,  DNA  methyla- 
tion levels  at  nearby  (within  1-2  kb)  CpG-sites  are  correlated  (co-methylation) 
(Bell  et  al.  201 1;  Eckhardt  et  al.  2006),  which  suggests  that  a  Bonferroni  correction 
is  likely  overconservative.  Criteria  for  genome- wide  significance  thresholds  in 
EWAS  have  not  yet  been  established.  Therefore,  alternative  methods  to  assess 
genome-wide  significance  should  be  considered,  including  false  discovery  rate 
(FDR)  correction  for  multiple  comparisons,  or  permutation-based  approaches. 

14.3.6  Validation 

To  date,  most  DNA  methylation  studies  have  used  bisulfite  sequencing  for  valida- 
tion of  selected  DMRs.  However,  large-scale  EWAS  may  require  custom  validation 
assays.  A  commonly  used  custom  DNA  methylation  validation  assay  may  be 
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bisulfite  sequencing  based,  and  would  include  the  identified  disease-related  DMRs, 
genes  near  known  GWAS  signals  for  this  disease,  and  potentially  also  other  related 
disease  DMRs,  certain  tissue-specific  CpG-sites  of  tissue  relevance  to  the  disease  of 
interest,  as  well  as  heritable  CpG-sites,  CpG-sites  associated  with  environmental 
variants,  and  highly  variable  methylation  regions.  On  the  other  hand,  designing 
disease-specific  validation  assays  may  also  be  appropriate,  for  example,  a  cancer- 
specific  panel  may  target  CpG  island  shores,  cancer-DMRs,  and  cancer-related 
hypo-methylation  blocks  (Hansen  et  al.  2011;  Berman  et  al.  2012),  while 
common-disease  panels  may  include  promoter-specific  regions  and  previously 
identified  disease-associated  DMRs. 


14.3. 7  Replication 

Replication  guidelines  for  GWAS  are  clearly  established  and  will  be  even  more 
necessary  in  EWAS  due  to  the  dynamic  nature  of  DNA  methylation  variation  and 
the  potential  bias  and  noise  in  measuring  DNA  methylation  levels  genome- wide. 
Further  complications  that  EWAS  replication  should  address  are  DMR  tissue 
specificity  and  underlying  cause.  For  example,  for  a  genetically  driven  DMR 
identified  in  a  discovery  case-control  study,  replication  in  disease-discordant 
twins  may  not  be  appropriate.  Further  details  on  replication  considerations  and 
guidelines  for  EWAS  have  been  previously  been  discussed  in  more  detail  elsewhere 
(see  Rakyan  et  al.  2011b). 


14.4    After  an  EWAS  Study:  Further  Investigations 

Once  EWAS  DMRs  have  been  identified,  several  follow-up  analyses  should  be 
considered  as  routine  steps.  These  include  validation  of  the  DNA  methylation 
signal  by  a  different  assay,  replication  of  the  DMR  effect  in  an  independent  sample, 
and  longitudinal  studies  to  assess  the  temporal  stability  and  timing  of  the  DMR 
relative  to  disease  onset.  After  these  initial  analyses,  further  work  can  focus  on 
understanding  the  role  of  the  DMR  in  the  trait. 


14.4.1    Genetic-Epigenetic  Analyses 

Integrating  genetic  and  epigenetic  information  in  a  shared  analytical  framework  can 
help  to  assess  whether  the  DMR  mediates  genotype-phenotype  associations.  In  this 
case,  DMR  effects  are  likely  to  be  detected  in  a  sample  of  unrelated  individuals,  but 
may  not  be  identified  in  samples  of  disease-discordant  MZ  twins.  Examples  of 
integrative  analyses  include  genotype-epigenotype  effects  in  rheumatoid  arthritis 
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(Liu  et  al.  2013)  and  combining  me-QTL,  GWAS,  and  DMR  findings  in  age-related 
phenotypes  (Bell  et  al.  2012).  Although  genetic-epigenetic  results  imply  causality, 
this  is  not  necessarily  the  case,  because  it  is  possible  that  genetic  associations  lead 
to  the  phenotype,  which  in  turn  drives  changes  in  methylation  and  alters  gene 
expression  as  a  consequence. 


14.4.2  Causality 

To  date,  longitudinal  EWAS  are  the  only  design  that  can  inform  causality  of  DMR 
effects  on  the  phenotype.  If  longitudinal  samples  are  not  available,  case-control  and 
twin  EWAS  follow-up  in  this  area  may  include  Mendelian  randomization 
approaches  (Relton  and  Davey  Smith  2010),  intervention  trials,  and  causal  infer- 
ence analyses,  for  example,  using  structural  equation  modeling,  Bayesian  network 
analysis,  or  multivariate  phenotype  analysis  methods.  However,  in  all  cases,  to 
conclusively  infer  causal  effects  of  DNA  methylation  on  the  trait,  experimental 
evidence  will  be  necessary.  For  some  disease  DMRs,  it  may  be  possible  to  manipu- 
late DNA  methylation  levels  at  the  DMR  of  interest  in  model  organisms  in  the 
appropriate  tissue  and  at  the  appropriate  developmental  stage,  but  for  other  traits 
such  as  psychiatric  disorders  experimental  studies  may  prove  challenging. 


14.4.3    Gene  Expression 

Gene  expression  at  the  genes  surrounding  the  DMR  variant  can  help  clarify 
function  (Heijmans  and  Mill  2012).  If  expression  data  are  not  available  for  relevant 
tissues,  comparing  methylation  and  expression  across  multiple  tissues  may  inform 
the  tissue  specificity  of  the  variant.  The  presence  of  transcription  factor-binding 
sites  in  the  vicinity  of  the  DMR  variant  may  also  prove  informative.  Comparison  of 
multiple  levels  of  epigenetic  regulation  in  a  wider  genomic  region  of  the  DMR  may 
also  inform  the  role  of  the  variant,  as  previously  discussed  (Heijmans  and 
Mill  2012). 


14.4.4   Allele-Specific  Methylation 

Recent  studies  have  surveyed  levels  of  allele- specific  methylation  (ASM),  and 
show  that  ASM  occurs  not  only  at  known  imprinted  loci  but  is  also  prevalent 
throughout  the  genome  (Fang  et  al.  2012;  Kerkel  et  al.  2008).  However,  the 
majority  of  EWAS  to  date  assume  that  the  overall  level  of  DNA  methylation  at  a 
locus  of  interest  may  influence  the  phenotype,  yet  more  complex  epigenetic  risk 
models  can  also  exist.  For  example,  in  allele-specific  methylation,  only  one  allele  at 
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a  heterozygous  locus  may  be  methylated,  and  this  methylation  could  alter  gene 
activity  in  cis  and  affect  the  phenotype  of  interest.  Such  effects  may  not  be  detected 
in  standard  EWAS  using  assays  that  measure  overall  levels  of  methylation  at  a 
locus,  such  as  the  Illumina450k  array.  Technologies  that  can  detect  allele- specific 
methylation  (for  example,  MeDIP-seq  and  WGBS)  will  be  useful  to  establish  the 
extent  of  ASM  effects  on  phenotypes. 


14.4.5   EWAS  for  Phenotype  Variability 

It  has  been  proposed  that  epigenetic  variants  could  influence  not  only  phenotype 
level  but  also  phenotype  variance  (Feinberg  and  Irizarry  2010).  Similar  effects  have 
been  observed  for  genetic  variants  and  several  GWAS  of  phenotype  variability 
have  been  performed  across  species  (Hulse  and  Cai  2013;  Ronnegard  and  Valdar 
2011;  Surakka  et  al.  2012;  Yang  et  al.  2012).  For  example,  recently  a  genetic 
variant  in  the  FTO  gene  was  identified  to  influence  variability  in  body  mass 
composition  (BMI)  in  humans  (Yang  et  al.  2012)  and  several  mechanisms  were 
proposed  to  explain  allelic  effects  on  BMI  variance,  including  epigenetic  effects. 

EWAS  study  designs  for  trait  variance  can  include  comparisons  of  DNA  meth- 
ylation to  phenotype  variance  in  large-scale  samples  of  unrelated  individuals,  and 
comparisons  of  MZ  twin-pair  discordance  in  DNA  methylation  and  phenotype. 
Furthermore,  it  is  also  worth  pursuing  genotype-epigenotype  integrative  analyses 
in  this  setting  because  genetic  effects  on  phenotype  variability  may  be  mediated  by 
epigenetic  variation  (Feinberg  and  Irizarry  2010).  It  is  also  possible  that  a  genetic 
variant  that  impacts  phenotype  variance  interacts  with  an  epigenetic  variant,  that  is, 
DNA  methylation  may  determine  phenotype  levels  only  in  individuals  of  genotypes 
associated  with  greater  phenotypic  variance.  This  presents  potential  for  novel  types 
of  analyses  further  integrating  genetic  and  epigenetic  data  to  understand  the 
mechanisms  involved  in  human  complex  traits. 


14.5  Conclusion 

EWAS  provide  promising  methods  to  detect  disease-associated  epigenetic  varia- 
tion, but  also  highlight  methodological  challenges  in  studying  dynamic  epigenetic 
marks.  These  include  the  need  to  perform  longitudinal  EWAS,  as  well  as  validation 
and  replication  in  large  samples.  Ultimately,  integrative  analysis  across  multiple 
sources  of  genomic,  epigenomic,  functional,  and  phenotypic  data  will  help  disen- 
tangle the  biological  mechanisms  involved  in  human  complex  disease. 
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